Self-supervised video pretraining yields robust and more human-aligned visual representations
Authors: Nikhil Parthasarathy, S. M. Ali Eslami, Joao Carreira, Olivier Henaff
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present in Table 1 the transfer performance of VITO compared to strong supervised and selfsupervised baselines on dense scene understanding (semantic segmentation and object detection), video understanding (video segmentation and action recognition), and out-of-distribution (OOD) object recognition. |
| Researcher Affiliation | Collaboration | Nikhil Parthasarathy S. M. Ali Eslami João Carreira Olivier J. Hénaff Google Deep Mind Current affiliation: NYU Center for Neural Science, work done while interning at Deep Mind. |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations but does not include any clearly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or a direct link indicating that the source code for the VITO methodology described in this paper is publicly available. |
| Open Datasets | Yes | We present in Table 1 the transfer performance of VITO compared to strong supervised and selfsupervised baselines on dense scene understanding (semantic segmentation and object detection), video understanding (video segmentation and action recognition), and out-of-distribution (OOD) object recognition. (Table 1 lists: ADE20K, COCO, DAVIS, UCF101, IN-A, IN-Vid). |
| Dataset Splits | Yes | We fine-tune for 45 epochs on the PASCAL train_aug2012 set or 60 epochs on the ADE20K train set. We report m Io U on the val set averaged across 5 runs. |
| Hardware Specification | Yes | We pretrain Res Net-50 using the LARS optimizer [78] with a batch size of 4096 split across 128 Cloud TPU v3 workers. |
| Software Dependencies | No | The paper refers to codebases used (e.g., 'the mmseg codebase' and 'https://github.com/open-mmlab/mmsegmentation') but does not specify particular software versions for libraries or frameworks like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We pretrain Res Net-50 using the LARS optimizer [78] with a batch size of 4096 split across 128 Cloud TPU v3 workers. We adopt the optimization details of BYOL, scaling the learning rate linearly with the batch size and decaying it according to a cosine schedule. The base learning rate is 0.3 and the weight decay is 10^-6. |