reproducibilityindex.ai

Self-supervised learning through the eyes of a child

Authors: Emin Orhan, Vaibhav Gupta, Brenden M. Lake

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, our goal is precisely to achieve such progress by utilizing modern self-supervised deep learning methods and a recent longitudinal, egocentric video dataset recorded from the perspective of three young children (Sullivan et al., 2020). Our results demonstrate the emergence of powerful, high-level visual representations from developmentally realistic natural videos using generic self-supervised learning objectives.
Researcher Affiliation	Academia	A. Emin Orhanδ Vaibhav V. Guptaδ Brenden M. Lakeδ,ψ δCenter for Data Science, ψDepartment of Psychology New York University {eo41, vvg239, brenden}@nyu.edu
Pseudocode	No	The paper schematically illustrates the temporal classiﬁcation objective in Figure 2 and describes the algorithms in text, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Pre-trained models and training/testing code are available at: https://github. com/eminorhan/baby-vision.
Open Datasets	Yes	We use the SAYCam dataset (Sullivan et al., 2020) in this study, hosted on the Databrary repository for behavioral science: https://nyu.databrary.org/.
Dataset Splits	No	The paper specifies 'random iid splits (with 50% training-50% test data)' for evaluation. While it mentions training and testing, it does not explicitly state a distinct 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper mentions training 'deep convolutional networks' and 'large-scale model' but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, memory, or cloud instances) for running the experiments.
Software Dependencies	No	The paper refers to 'Mobile Net V2 architecture', 'Py Torch implementation', and 'skimage.feature' but does not provide specific version numbers for any of these software components, which are necessary for full reproducibility.
Experiment Setup	Yes	Our best model is a temporal classiﬁcation model that uses a sampling rate of 5 fps (frames per second), a segment length of 288 seconds, and data augmentation in the form of color and grayscale augmentations as in Chen et al. (2020a).