Abstract
Active perception approaches select future viewpoints by using some estimate of the information gain. An inaccurate estimate can be detrimental in critical situations, e.g., locating a person in distress. However the true information gained can only be calculated post hoc, i.e., after the observation is realized. We present an approach to estimate the discrepancy between the estimated information gain (which is the expectation over putative future observations while neglecting correlations among them) and the true information gain. The key idea is to analyze the mathematical relationship between active perception and the estimation error of the information gain in a gametheoretic setting. Using this, we develop an online estimation approach that achieves sub-linear regret (in the number of timesteps) for the estimation of the true information gain and reduces the sub-optimality of active perception systems. We demonstrate our approach 11Code is available at https://github.com/grasp-lyd/active-perception-game. Proofs are available at https://arxiv.org/abs/2404.00769.for active perception using a comprehensive set of experiments on: (a) different types of environments, including a quadrotor in a photorealistic simulation, real-world robotic data, and real-world experiments with ground robots exploring indoor and outdoor scenes; (b) different types of robotic perception data; and (c) different map representations. On average, our approach reduces information gain estimation errors by 42%, increases the information gain by 7%, PSNR by 5%, and semantic accuracy (measured as the number of objects that are localized correctly) by 6%. In real-world experiments with a Jackal ground robot, our approach demonstrated complex trajectories to explore occluded regions.
https://ieeexplore.ieee.org/abstract/document/11128798

Comparison of our approach for active perception against a baseline for a quadrotor exploring an indoor environment in a photorealistic simulator. Our approach reduces estimation errors (a), leads to a higher information gain (b), better reconstruction with higher peak signal-to-noise-ratio (PSNR) and lower depth mean square error (MSE) in the learned neural radiance field (NeRF) (c), and leads to an increase the total number of objects that are correctly localized in the scene (d).