Improving Object-centric Learning with Query Optimization

Authors: Baoxiong Jia, Yu Liu, Siyuan Huang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we propose to address these issues by investigating the potential of learnable queries as initializations for Slot-Attention learning, uniting it with efforts from existing attempts on improving Slot-Attention learning with bi-level optimization. With simple code adjustments on Slot-Attention, our model, Bi-level Optimized Query Slot Attention, achieves state-of-the-art results on 3 challenging synthetic and 7 complex real-world datasets in unsupervised image segmentation and reconstruction, outperforming previous baselines by a large margin. We provide thorough ablative studies to validate the necessity and effectiveness of our design. Additionally, our model exhibits great potential for concept binding and zero-shot learning. Our work is made publicly available at https://bo-qsa.github.io.
Researcher Affiliation Collaboration Baoxiong Jia1,3: , Yu Liu2,3: , Siyuan Huang3 1UCLA 2Tsinghua University 3 National Key Laboratory of General Artificial Intelligence, BIGAI
Pseudocode Yes Algorithm 1: BO-QSA
Open Source Code Yes Our work is made publicly available at https://bo-qsa.github.io.
Open Datasets Yes For the synthetic domain, we select three well-established challenging multi-object datasets Shapestacks (Groth et al., 2018), Objects Room (Kabra et al., 2019), and CLEVRTEX for evaluating our BO-QSA model. ... For the real image domain, we use two tasks (1) unsupervised foreground extraction and (2) unsupervised multi-object segmentation for evaluating our method. Specifically, we select Stanford Dogs (Khosla et al., 2011), Stanford Cars (Krause et al., 2013), CUB200 Birds (Welinder et al., 2010), and Flowers (Nilsback & Zisserman, 2010) as our benchmarking datasets for foreground extraction and YCB (Calli et al., 2017), Scan Net (Dai et al., 2017), COCO (Lin et al., 2014) proposed by Yang & Yang (2022) for multi-object segmentation.
Dataset Splits No The paper does not explicitly provide the specific percentages or absolute numbers for training, validation, and test dataset splits. While it mentions training steps and batch sizes, it lacks clear reproduction details for data partitioning beyond referencing established datasets.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as CPU or GPU models, or cloud computing instances with detailed specifications.
Software Dependencies No The paper mentions 'simple code adjustments' but does not list specific software dependencies (e.g., libraries, frameworks) with version numbers required for reproducibility.
Experiment Setup Yes We train our model for 250k steps with a batch size of 128 and describe all training configurations and hyperparameter selection Tab. 11. ... Table 11: Batch Size 128, LR 4e-4, Slot Dim 64, MLP Hidden Dim 128, Warmup Steps 5k, Decay Steps 50k, Max Steps 250k, Sigma Down Steps 30k. ... Table 12: batch size 128, warmup steps 10000, learning rate 1e-4, max steps 250k, vocabulary size 1024, Gumbel-Softmax annealing range 1.0 to 0.1, Gumbel-Softmax annealing steps 30000, lr-d VAE(no warmup) 3e-4. ... Transformer Decoder: layers 4, heads 4, dropout 0.1, hidden dimension 256. ... Slot Attention Module: slot dimension 256, iterations 3, σ annealing steps 30000(0).