Self-supervised Object-Centric Learning for Videos

Authors: Görkay Aydemir, Weidi Xie, Fatma Guney

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments 4.1 Experimental Setup Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25], a widely-used benchmark for evaluating object-centric methods, particularly for multi-object segmentation in videos. ... Metrics: For our synthetic dataset evaluation, we use the foreground adjusted rand index (FG-ARI) to measure the quality of clustering into multiple foreground objects.
Researcher Affiliation Academia Görkay Aydemir1 Weidi Xie3, 4 Fatma Güney1,2 1 Department of Computer Engineering, Koç University 2 KUIS AI Center 3 CMIC, Shanghai Jiao Tong University 4 Shanghai AI Laboratory
Pseudocode No The paper describes its methodology and architecture in text and figures, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Project page: https://kuis-ai.github.io/solv
Open Datasets Yes Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25]... Additionally, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set... For real-world datasets, we use the validation split of DAVIS17 [65].
Dataset Splits Yes For real-world datasets, we use the validation split of DAVIS17 [65]. In addition, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set, because there is no official validation or test set provided with ground-truth masks.
Hardware Specification Yes We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48.
Software Dependencies No The paper mentions using 'the sklearn library [64]' for Agglomerative Clustering but does not provide a specific version number for scikit-learn or any other software dependencies with their versions.
Experiment Setup Yes We set the number of consecutive frame range n to 2 and drop half of the tokens before the slot attention step. We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48. We clip the gradient norms at 1 to stabilize the training. ... MOVi-E: We train our model from scratch for a total of 60 epochs... We use a maximum learning rate of 4 10 4 and an exponential decay schedule... The model is trained using 18 slots and the input frames are adjusted to a size of 336 336... The slot merge coefficient in ψmerge is configured to 0.12.