Active Perception using Neural Radiance Fields

Top: Trajectories of a quadrotor that actively explores a complex and cluttered indoor environment to localize all the different kinds of objects. Bottom: we build a neural-radiance field (n e rf) representation of the scene to calculate this mutual information. this provides us with an accurate representation of the free space within which we can sample dynamically-feasible trajectories for a differentially-flat model of a quadrotor. this picture shows a mesh constructed from the nerf after active exploration; color denotes objects of different categories predicted by our semantic nerf.

Abstract

We study active perception from first principles to argue that an autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones. Doing so requires (a) a representation of the scene that summarizes past observations and the ability to update this representation to incorporate new observations (state estimation and mapping), (b) the ability to synthesize new observations of the scene (a generative model), and (c) the ability to select control trajectories that maximize predictive information (planning). This motivates a neural radiance field (NeRF)-like representation which captures photometric, geometric and semantic properties of the scene grounded. This representation is well-suited to synthesizing new observations from different viewpoints. And thereby, a sampling-based planner can be used to calculate the predictive information from synthetic observations along dynamically-feasible trajectories. We use active perception for exploring cluttered indoor environments and employ a notion of semantic uncertainty to check for the successful completion of an exploration task. We demonstrate these ideas via simulation in realistic 3D indoor environments11

Code is available at https://github.com/grasp-1yd/Active-Perception-using-Neural-Radiance-Fields.

Full manuscript is available at https://ieeexplore.ieee.org/document/10645027

Figure: Groundtruthobservation (left top) and predicted values (left bottom) of rgb, depth and semantic segmentation after exploring scene 2; the final nerf mesh is shown on the right.