Cycle Consistency Driven Object Discovery
Authors: Aniket Rajiv Didolkar, Anirudh Goyal, Yoshua Bengio
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By integrating these consistency objectives into various existing slot-based objectcentric methods, we showcase substantial improvements in object-discovery performance. These enhancements consistently hold true across both synthetic and real-world scenes, underscoring the effectiveness and adaptability of the proposed approach. To tackle the second limitation, we apply the learned object-centric representations from the proposed method to two downstream reinforcement learning tasks, demonstrating considerable performance enhancements compared to conventional slot-based and monolithic representation learning methods. |
| Researcher Affiliation | Collaboration | 1 Mila, University of Montreal, 2 Google Deep Mind |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For object-discovery we consider both synthetic and real-world datasets. For the synthetic datasets, we use Shapestacks (Groth et al., 2018), Objects Room (Kabra et al., 2019), Clevr Tex (Karazija et al., 2021). For real-world datasets, we consider the task of multi-object segmentation in COCO (Lin et al., 2014) and scannet (Dai et al., 2017) datasets. We also apply the proposed approach to object discovery in videos where we consider the Movi E video dataset (Greff et al., 2022). For our downstream RL tasks we use the atari and causal world environments (Ahmed et al., 2020). For foreground extraction, we use the Stanford dogs dataset (Khosla et al., 2012), Stanford cars dataset (Krause et al., 2013), CUB200 Birds dataset (Wah et al., 2011), and flowers dataset (Nilsback & Zisserman, 2006). |
| Dataset Splits | Yes | We follow the setup used in (Dittadi et al., 2022). We follow the exact setup from decision transformer (Chen et al., 2021) for this experiment. These references imply the use of standard train/validation/test splits, which are typically well-defined for the datasets mentioned (e.g., Atari, COCO, Clevr Tex). |
| Hardware Specification | Yes | For each experiment, we use 1 RTX8000 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'PPO' but does not provide specific version numbers for these or other libraries/frameworks used. |
| Experiment Setup | Yes | To set the hyperparameters for our approach, we select λsfs and λfsf as 0.1 and 0.01, respectively, unless otherwise specified. We also employ an additional Exponential Moving Average (EMA) visual encoder. We set τ1 to 0.1 and τ2 to 0.01 unless specified otherwise. More details regarding the architecture and hyper-parameters can be found in the Appendix. Table 7: This table indicates all the values for various hyperparameters used in the synthetic dataset experiments. Table 11: This table shows various hyperparameters used in the real-world dataset experiments where we use BO-Slate as the base model. Table 14: Here we present the values for the various hyperparameters used in the slot attention module for the decision transformer experiments. |