Dense Unsupervised Learning for Video Segmentation

Authors: Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the learned feature representations, we conduct experiments in the setting of semi-supervised VOS. The task provides a set of segmentation masks for the first frame in a video sequence and requires the evaluated algorithm to densely track the demarcated objects in the remaining frames. We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42].
Researcher Affiliation Academia Nikita Araslanov1 Simone Schaub-Meyer1 Stefan Roth1,2 1Department of Computer Science, TU Darmstadt 2hessian.AI {nikita.araslanov, simone.schaub, stefan.roth}@visinf.tu-darmstadt.de
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm,' nor does it present structured steps in a code-like format.
Open Source Code Yes Code (Apache-2.0 License) available at https://github.com/visinf/dense-ulearn-vos.
Open Datasets Yes We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42]. ... The Ox Uv A dataset [35] spans 366 video sequences with a total duration of 14 hours. The second dataset is You Tube-VOS [42]... In addition, we train on larger datasets, Tracking Net [26] and Kinetics-400 [7].
Dataset Splits Yes To evaluate on DAVIS-2017 [29] val, we independently train our feature extractor on 4 datasets. ... Following Lai et al. [19], we additionally evaluate our features on the You Tube-VOS 2018 valid split.
Hardware Specification Yes We train our models on one A100 GPU, although training our most accurate configuration of the framework requires only 12GB of memory, hence a single Titan X GPU is actually sufficient.
Software Dependencies No The paper mentions optimizers like Adam and SGD and various hyperparameters but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA 11.x).
Experiment Setup Yes At training time, we first scale the video frames such that the lowest side is between 256 and 320 pixels, and extract random crops of size 256 x 256. We train our network with Adam and the learning rate 10^-4 on the smaller You Tube-VOS and Ox Uv A, whereas we found SGD with the learning rate 10^-3 to work better on the larger Kinetics-400 and Tracking Net datasets. We set the temperature τ = 0.05 throughout our experiments; we observed its influence on the accuracy to not be significant. The hyperparameter λ, trading off the influence of the cross-view consistency, equals 0.1 by default...