Dense Unsupervised Learning for Video Segmentation
Authors: Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the learned feature representations, we conduct experiments in the setting of semi-supervised VOS. The task provides a set of segmentation masks for the first frame in a video sequence and requires the evaluated algorithm to densely track the demarcated objects in the remaining frames. We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42]. |
| Researcher Affiliation | Academia | Nikita Araslanov1 Simone Schaub-Meyer1 Stefan Roth1,2 1Department of Computer Science, TU Darmstadt 2hessian.AI {nikita.araslanov, simone.schaub, stefan.roth}@visinf.tu-darmstadt.de |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm,' nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Code (Apache-2.0 License) available at https://github.com/visinf/dense-ulearn-vos. |
| Open Datasets | Yes | We largely follow the VOS setup of Jabri et al. [17] and evaluate our method on DAVIS-2017 [35]. Following Lai et al. [19], we additionally test our approach on the You Tube-VOS val by submitting our results to an evaluation server [42]. ... The Ox Uv A dataset [35] spans 366 video sequences with a total duration of 14 hours. The second dataset is You Tube-VOS [42]... In addition, we train on larger datasets, Tracking Net [26] and Kinetics-400 [7]. |
| Dataset Splits | Yes | To evaluate on DAVIS-2017 [29] val, we independently train our feature extractor on 4 datasets. ... Following Lai et al. [19], we additionally evaluate our features on the You Tube-VOS 2018 valid split. |
| Hardware Specification | Yes | We train our models on one A100 GPU, although training our most accurate configuration of the framework requires only 12GB of memory, hence a single Titan X GPU is actually sufficient. |
| Software Dependencies | No | The paper mentions optimizers like Adam and SGD and various hyperparameters but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA 11.x). |
| Experiment Setup | Yes | At training time, we first scale the video frames such that the lowest side is between 256 and 320 pixels, and extract random crops of size 256 x 256. We train our network with Adam and the learning rate 10^-4 on the smaller You Tube-VOS and Ox Uv A, whereas we found SGD with the learning rate 10^-3 to work better on the larger Kinetics-400 and Tracking Net datasets. We set the temperature τ = 0.05 throughout our experiments; we observed its influence on the accuracy to not be significant. The hyperparameter λ, trading off the influence of the cross-view consistency, equals 0.1 by default... |