Self-supervised Object-Centric Learning for Videos
Authors: Görkay Aydemir, Weidi Xie, Fatma Guney
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments 4.1 Experimental Setup Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25], a widely-used benchmark for evaluating object-centric methods, particularly for multi-object segmentation in videos. ... Metrics: For our synthetic dataset evaluation, we use the foreground adjusted rand index (FG-ARI) to measure the quality of clustering into multiple foreground objects. |
| Researcher Affiliation | Academia | Görkay Aydemir1 Weidi Xie3, 4 Fatma Güney1,2 1 Department of Computer Engineering, Koç University 2 KUIS AI Center 3 CMIC, Shanghai Jiao Tong University 4 Shanghai AI Laboratory |
| Pseudocode | No | The paper describes its methodology and architecture in text and figures, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Project page: https://kuis-ai.github.io/solv |
| Open Datasets | Yes | Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25]... Additionally, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set... For real-world datasets, we use the validation split of DAVIS17 [65]. |
| Dataset Splits | Yes | For real-world datasets, we use the validation split of DAVIS17 [65]. In addition, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set, because there is no official validation or test set provided with ground-truth masks. |
| Hardware Specification | Yes | We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48. |
| Software Dependencies | No | The paper mentions using 'the sklearn library [64]' for Agglomerative Clustering but does not provide a specific version number for scikit-learn or any other software dependencies with their versions. |
| Experiment Setup | Yes | We set the number of consecutive frame range n to 2 and drop half of the tokens before the slot attention step. We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48. We clip the gradient norms at 1 to stabilize the training. ... MOVi-E: We train our model from scratch for a total of 60 epochs... We use a maximum learning rate of 4 10 4 and an exponential decay schedule... The model is trained using 18 slots and the input frames are adjusted to a size of 336 336... The slot merge coefficient in ψmerge is configured to 0.12. |