Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

Authors: Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show the proposed model generalizes much better than geometryunaware LSTM/GRU networks, especially under the presence of multiple objects and cross-object occlusions. Conducting extensive experiments and ablations against alternative recurrent architectures in both easy" and hard" environments.
Researcher Affiliation Academia Ricson Cheng , Ziyan Wang , Katerina Fragkiadaki Carnegie Mellon University Pittsburgh, PA 15213 {ricsonc,ziyanw1}@andrew.cmu.edu, katef@cs.cmu.edu
Pseudocode No The paper describes mathematical equations for the GRU update and outlines steps in prose, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Our codes will be made available at ricsonc.github.io/geometryaware.
Open Datasets Yes We test our models in two types of simulated environments: i) scenes we create using synthetic 3D object models from Shape Net [5]. The generated scenes contain single or multiple objects on a table surface. ii) scenes from the SUNCG [33] dataset, sampled around an object of interest.
Dataset Splits Yes We split the data into training, validation, and test according to [5]. We uses 70% of the scenes for training and the rest for testing.
Hardware Specification No The paper mentions consuming 'lots of GPU memory' but does not specify any particular GPU models, CPU types, or other hardware specifications used for experiments.
Software Dependencies No The paper discusses neural network architectures and models like GRU and LSTM, but does not specify versions of programming languages, libraries, or other software dependencies used for implementation.
Experiment Setup Yes We train our policy network with REINFORCE [34] with a Monte Carlo baseline. We train for 3D reconstruction (voxel occupancy) using a standard binary cross-entropy loss. We train for object instance segmentation by learning voxel segmentation embeddings [13]. Clustering using the learnt embedding distances provides a set of voxel clusters, ideally, each one capturing an object in the scene. We use metric learning and a standard contrastive loss [12], that brings segmentation embeddings of voxels of the same object instance close, and pushes segmentation embeddings of different object instances (and the empty space) apart. During training, we sample the same number of voxel examples from each object instance to keep the loss balanced, not proportional to the volume of each object. We stop merging when R < 1.5. We trained our policy using three random seeds and report the mean and variance.