Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Authors: Nanbo Li, Cian Eastwood, Robert Fisher

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are designed to demonstrate that Mul MON is a feasible solution to the MOMV problem, and to demonstrate that Mul MON learns better representations than the MOSV and SOMV models by resolving spatial ambiguity. To do so, we compare the performance of Mul MON against two baseline models, IODINE[9] (MOSV) and GQN[8] (SOMV), in terms of segmentation, viewpointqueried prediction (appearance and segmentation) and disentanglement (interand intra-object).
Researcher Affiliation Academia Li Nanbo School of Informatics University of Edinburgh nanbo.li@ed.ac.uk Cian Eastwood School of Informatics University of Edinburgh c.eastwood@ed.ac.uk Robert B Fisher School of Informatics University of Edinburgh rbf@inf.ed.ac.uk
Pseudocode Yes Algorithm 1: Mul MON at Test Time: Online Scene Learning
Open Source Code Yes Code available at https://github.com/Nanbo Li/Mul MON
Open Datasets Yes To best facilitate these comparisons, we created two new datasets called CLEVR6-Multi View (abbr. CLE-MV) and CLEVR6-Augmented (abbr. CLE-Aug) which contain ground-truth segmentation masks and shape descriptions (e.g. colors, materials, etc.). The CLE-MV dataset is a multi-view, observation-enabled variant (10 views per scene) of the CLEVR6 dataset[13, 9]. The CLE-Aug adds more complex shapes (e.g. horses, ducks, and teapots etc.) to the CLE-MV environment. In addition, we compare the models on the GQN-Jaco dataset[8] and use the GQN-Shepard-Metzler7 dataset[8] (abbr. Shep7) for a specific ablation study.
Dataset Splits No The paper describes how observations are partitioned for training (T) and novel-viewpoint prediction (Q) within a scene, but does not provide explicit train/validation/test dataset splits with percentages or counts for the overall datasets used (CLE-MV, CLE-Aug, GQN-Jaco).
Hardware Specification No The paper mentions "the GPU computing support from Dr. Zhibin Li s Advanced Intelligence Robotics Lab at the University of Edinburgh" but does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We train all models using an Adam optimizer with an initial learning rate 0.0003 for 300k gradient steps.