Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
Authors: Yinxuan Huang, Chengmin Gao, Bin Li, Xiangyang Xue
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our proposed model, we focus on four tasks: unsupervised object segmentation, scene reconstruction, compositional generation, and novel viewpoint synthesis. For the first two tasks, we compare our active viewpoint selection strategy with the random viewpoint selection strategy to highlight its superiority. Additionally, we evaluate our model against other multi-viewpoint approaches, including SIMONe [7] and OCLOC [9, 10], as well as the single-image-based method LSD [11]. |
| Researcher Affiliation | Academia | Yinxuan Huang, Chengmin Gao, Bin Li , Xiangyang Xue Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University yxhuang22@m.fudan.edu.cn, {19210240036, libin, xyxue}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1: Multi-Viewpoint Slot Attention Algorithm 2: Active Viewpoint Selection Algorithm |
| Open Source Code | Yes | We will provide open access to the data and code on Git Hub at https: //github.com/Yinxuan H/active-viewpoint-selection. |
| Open Datasets | Yes | We generated three synthetic multi-object multi-viewpoint datasets, referred to as CLEVRTEX, GSO, and Shape Net, to evaluate the performance of our model. These datasets were constructed based on the CLEVRTEX dataset [25], the GSO dataset [26], and the Shape Net dataset [27], respectively. They were created using the official code provided by CLEVRTEX [25] and Kubric [28]. |
| Dataset Splits | Yes | Table 2: Configurations of datasets Datasets CLEVRTEX GSO/Shape Net Split Train Valid Test Train Vaid Test # of Images 5000 100 100 5000 100 100 |
| Hardware Specification | Yes | We train our model on 4 NVIDIA RTX 4090 GPUs over 4.5 days, while SIMONe is trained in 1.5 days, OCLOC in 2.5 days, and LSD in 1.5 days, all using the same GPU setup. |
| Software Dependencies | No | The paper mentions software like PyTorch and DINO but does not specify their version numbers (e.g., "We implement SIMONe using the Py Torch framework." and "Following DINOSAUR [32], we utilize the pretrained DINO to extract features from images"). |
| Experiment Setup | Yes | Table 3: Hyperparameters of our model used in experiments. Lists detailed parameters such as Batch Size, Training Steps, Input Resolution, Patch Size, Channel Multipliers, Learning Rate, # Iterations, Slot Attr Size, Slot View Size, # Slots, etc. for various modules. |