reproducibilityindex.ai

Self-supervised video pretraining yields robust and more human-aligned visual representations

Authors: Nikhil Parthasarathy, S. M. Ali Eslami, Joao Carreira, Olivier Henaff

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present in Table 1 the transfer performance of VITO compared to strong supervised and selfsupervised baselines on dense scene understanding (semantic segmentation and object detection), video understanding (video segmentation and action recognition), and out-of-distribution (OOD) object recognition.
Researcher Affiliation	Collaboration	Nikhil Parthasarathy S. M. Ali Eslami João Carreira Olivier J. Hénaff Google Deep Mind Current affiliation: NYU Center for Neural Science, work done while interning at Deep Mind.
Pseudocode	No	The paper describes the methodology using text and mathematical equations but does not include any clearly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code	No	The paper does not contain an explicit statement or a direct link indicating that the source code for the VITO methodology described in this paper is publicly available.
Open Datasets	Yes	We present in Table 1 the transfer performance of VITO compared to strong supervised and selfsupervised baselines on dense scene understanding (semantic segmentation and object detection), video understanding (video segmentation and action recognition), and out-of-distribution (OOD) object recognition. (Table 1 lists: ADE20K, COCO, DAVIS, UCF101, IN-A, IN-Vid).
Dataset Splits	Yes	We fine-tune for 45 epochs on the PASCAL train_aug2012 set or 60 epochs on the ADE20K train set. We report m Io U on the val set averaged across 5 runs.
Hardware Specification	Yes	We pretrain Res Net-50 using the LARS optimizer [78] with a batch size of 4096 split across 128 Cloud TPU v3 workers.
Software Dependencies	No	The paper refers to codebases used (e.g., 'the mmseg codebase' and 'https://github.com/open-mmlab/mmsegmentation') but does not specify particular software versions for libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We pretrain Res Net-50 using the LARS optimizer [78] with a batch size of 4096 split across 128 Cloud TPU v3 workers. We adopt the optimization details of BYOL, scaling the learning rate linearly with the batch size and decaying it according to a cosine schedule. The base learning rate is 0.3 and the weight decay is 10^-6.