Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos

Authors: Gautam Singh, Yi-Fu Wu, Sungjin Ahn

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiment results on various complex and naturalistic videos show significant improvements compared to the previous state-of-the-art.
Researcher Affiliation Academia Gautam Singh Rutgers University singh.gautam@rutgers.edu Yi-Fu Wu Rutgers University yifu.wu@gmail.com Sungjin Ahn KAIST sjn.ahn@gmail.com
Pseudocode No The paper describes the model architecture and training process using text and mathematical equations, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes In supplementary material, we provide the architectural details and the hyperparameters used for our experiments. We will share the code and our proposed datasets at https://sites.google.com/view/slot-transformer-for-videos.
Open Datasets Yes We evaluate our model on 8 datasets. These include 6 procedurally generated datasets: CATER (Girdhar & Ramanan, 2020), CATERTex, MOVi-Solid, MOVi-Tex, MOVi-D, and MOVi-E (Greff et al., 2022); and 2 natural datasets: Traffic and Aquarium. ... For existing benchmarks i.e. CATER, MOVi-D and MOVi-E, we use the standard train and test splits as prescribed by their respective authors. We will release all the proposed datasets which, we believe, will facilitate future research.
Dataset Splits No The paper mentions "standard train and test splits" for existing benchmarks, but it does not explicitly state a validation split, nor does it provide specific percentages or counts for any validation set used.
Hardware Specification Yes We assessed this by training the models on two RTX 3090 GPUs. For the image size 64x64, training for 1M steps took around 2-3 days and for image size 128x128, it took around 5-6 days. The max GPU memory usage for 64x64 was around 16GB and for 128x128 around 22GB. The batch size for all models for 64x64 was 1 and for 128x128 was 1.
Software Dependencies No The paper does not list specific software components with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes In supplementary material, we provide the architectural details and the hyperparameters used for our experiments. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes]