reproducibilityindex.ai

Self-supervised Object-Centric Learning for Videos

Authors: Görkay Aydemir, Weidi Xie, Fatma Guney

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Experimental Setup Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25], a widely-used benchmark for evaluating object-centric methods, particularly for multi-object segmentation in videos. ... Metrics: For our synthetic dataset evaluation, we use the foreground adjusted rand index (FG-ARI) to measure the quality of clustering into multiple foreground objects.
Researcher Affiliation	Academia	Görkay Aydemir1 Weidi Xie3, 4 Fatma Güney1,2 1 Department of Computer Engineering, Koç University 2 KUIS AI Center 3 CMIC, Shanghai Jiao Tong University 4 Shanghai AI Laboratory
Pseudocode	No	The paper describes its methodology and architecture in text and figures, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1Project page: https://kuis-ai.github.io/solv
Open Datasets	Yes	Datasets: Our proposed method is evaluated on one synthetic and two real-world video datasets. For the synthetic dataset, we select MOVi [25]... Additionally, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set... For real-world datasets, we use the validation split of DAVIS17 [65].
Dataset Splits	Yes	For real-world datasets, we use the validation split of DAVIS17 [65]. In addition, we evaluate our method on a subset of the Youtube-VIS 2019 (YTVIS19) [87] train set, because there is no official validation or test set provided with ground-truth masks.
Hardware Specification	Yes	We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48.
Software Dependencies	No	The paper mentions using 'the sklearn library [64]' for Agglomerative Clustering but does not provide a specific version number for scikit-learn or any other software dependencies with their versions.
Experiment Setup	Yes	We set the number of consecutive frame range n to 2 and drop half of the tokens before the slot attention step. We train our models on 2 V100 GPUs using the Adam [39] optimizer with a batch size of 48. We clip the gradient norms at 1 to stabilize the training. ... MOVi-E: We train our model from scratch for a total of 60 epochs... We use a maximum learning rate of 4 10 4 and an exponential decay schedule... The model is trained using 18 slots and the input frames are adjusted to a size of 336 336... The slot merge coefficient in ψmerge is configured to 0.12.