GaussExplorer: 3D Gaussian Splatting
for Embodied Exploration and Reasoning

1POSTECH 2KAIST 3ETRI 4NVIDIA
arXiv 2026

GaussExplorer aims for embodied exploration and reasoning by integrating 3D Gaussian Splatting and Vision-Language Models (VLMs).

Abstract

We present GaussExplorer, a framework for embodied exploration and reasoning built on 3D Gaussian Splatting (3DGS). While prior approaches to language-embedded 3DGS have made meaningful progress in aligning simple text queries with Gaussian embeddings, they are generally optimized for relatively simple queries and struggle to interpret more complex, compositional language queries. Alternative studies based on object-centric RGB-D structured memories provide spatial grounding but are constrained by pre-fixed viewpoints. To address these issues, GaussExplorer introduces Vision-Language Models (VLMs) on top of 3DGS to enable question-driven exploration and reasoning within 3D scenes. We first identify pre-captured images that are most correlated with the query question, and subsequently adjust them into novel viewpoints to more accurately capture visual information for better reasoning by VLMs. Experiments show that ours outperforms existing methods on several benchmarks, demonstrating the effectiveness of integrating VLM-based reasoning with 3DGS for embodied tasks.

Overview

(a) We first build a semantic 3DGS scene, where input views and their semantic information produced by foundation models are lifted into 3D. (b) In initial view selection, (b-1) the query is first rephrased into relevant semantic categories by LLM, (b-2) activated 3D Gaussians associated with those categories are grouped into spatial clusters, and (b-3) representative training camera poses covering these clusters are selected. (c) Finally, the selected views are rendered and passed to a Vision-Language Model (VLM) for fine-grained reasoning, which identifies the most informative views and produces the final answer to the query.

Embodied Question-Answering (EQA) Results

3D Referring Segmentation Results

BibTeX


    @inproceedings{yu2026gaussexplorer,
        title={GaussExplorer: 3D Gaussian Splatting for Embodied Exploration and Reasoning},
        author={Yu-Ji, Kim and Lee, Dahye and Jun-Seong, Kim and Kim, GeonU and Hyeon-Woo, Nam and Kwon, Yongjin and Wang, Yu-Chiang Frank and Choe, Jaesung and Oh, Tae-Hyun},
        booktitle={arXiv},
        year={2026}
    }