Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
Authors: Gautam Singh, Yi-Fu Wu, Sungjin Ahn
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiment results on various complex and naturalistic videos show significant improvements compared to the previous state-of-the-art. |
| Researcher Affiliation | Academia | Gautam Singh Rutgers University singh.gautam@rutgers.edu Yi-Fu Wu Rutgers University yifu.wu@gmail.com Sungjin Ahn KAIST sjn.ahn@gmail.com |
| Pseudocode | No | The paper describes the model architecture and training process using text and mathematical equations, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | In supplementary material, we provide the architectural details and the hyperparameters used for our experiments. We will share the code and our proposed datasets at https://sites.google.com/view/slot-transformer-for-videos. |
| Open Datasets | Yes | We evaluate our model on 8 datasets. These include 6 procedurally generated datasets: CATER (Girdhar & Ramanan, 2020), CATERTex, MOVi-Solid, MOVi-Tex, MOVi-D, and MOVi-E (Greff et al., 2022); and 2 natural datasets: Traffic and Aquarium. ... For existing benchmarks i.e. CATER, MOVi-D and MOVi-E, we use the standard train and test splits as prescribed by their respective authors. We will release all the proposed datasets which, we believe, will facilitate future research. |
| Dataset Splits | No | The paper mentions "standard train and test splits" for existing benchmarks, but it does not explicitly state a validation split, nor does it provide specific percentages or counts for any validation set used. |
| Hardware Specification | Yes | We assessed this by training the models on two RTX 3090 GPUs. For the image size 64x64, training for 1M steps took around 2-3 days and for image size 128x128, it took around 5-6 days. The max GPU memory usage for 64x64 was around 16GB and for 128x128 around 22GB. The batch size for all models for 64x64 was 1 and for 128x128 was 1. |
| Software Dependencies | No | The paper does not list specific software components with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | In supplementary material, we provide the architectural details and the hyperparameters used for our experiments. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] |