Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dense Unsupervised Learning for Video Segmentation
Authors: Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the learned feature representations, we conduct experiments in the setting of semi-supervised VOS. The task provides a set of segmentation masks for the first frame in a video sequence and requires the evaluated algorithm to densely track the demarcated objects in the remaining frames. We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42]. |
| Researcher Affiliation | Academia | Nikita Araslanov1 Simone Schaub-Meyer1 Stefan Roth1,2 1Department of Computer Science, TU Darmstadt 2hessian.AI EMAIL |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm,' nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Code (Apache-2.0 License) available at https://github.com/visinf/dense-ulearn-vos. |
| Open Datasets | Yes | We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42]. ... The Ox Uv A dataset [35] spans 366 video sequences with a total duration of 14 hours. The second dataset is You Tube-VOS [42]... In addition, we train on larger datasets, Tracking Net [26] and Kinetics-400 [7]. |
| Dataset Splits | Yes | To evaluate on DAVIS-2017 [29] val, we independently train our feature extractor on 4 datasets. ... Following Lai et al. [19], we additionally evaluate our features on the You Tube-VOS 2018 valid split. |
| Hardware Specification | Yes | We train our models on one A100 GPU, although training our most accurate configuration of the framework requires only 12GB of memory, hence a single Titan X GPU is actually sufficient. |
| Software Dependencies | No | The paper mentions optimizers like Adam and SGD and various hyperparameters but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA 11.x). |
| Experiment Setup | Yes | At training time, we first scale the video frames such that the lowest side is between 256 and 320 pixels, and extract random crops of size 256 x 256. We train our network with Adam and the learning rate 10^-4 on the smaller You Tube-VOS and Ox Uv A, whereas we found SGD with the learning rate 10^-3 to work better on the larger Kinetics-400 and Tracking Net datasets. We set the temperature τ = 0.05 throughout our experiments; we observed its influence on the accuracy to not be significant. The hyperparameter λ, trading off the influence of the cross-view consistency, equals 0.1 by default... |