Abstract
We study active perception from first principles to argue that an autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones. Doing so requires (a) a representation of the scene that summarizes past observations and the ability to update this representation to incorporate new observations (state estimation and mapping), (b) the ability to synthesize new observations of the scene (a generative model), and (c) the ability to select control trajectories that maximize predictive information (planning). This motivates a neural radiance field (NeRF)-like representation which captures photometric, geometric and semantic properties of the scene grounded. This representation is well-suited to synthesizing new observations from different viewpoints. And thereby, a sampling-based planner can be used to calculate the predictive information from synthetic observations along dynamically-feasible trajectories. We use active perception for exploring cluttered indoor environments and employ a notion of semantic uncertainty to check for the successful completion of an exploration task. We demonstrate these ideas via simulation in realistic 3D indoor environments11
Code is available at https://github.com/grasp-1yd/Active-Perception-using-Neural-Radiance-Fields.
Full manuscript is available at https://ieeexplore.ieee.org/document/10645027

Figure: Groundtruthobservation (left top) and predicted values (left bottom) of rgb, depth and semantic segmentation after exploring scene 2; the final nerf mesh is shown on the right.