Toddler-Inspired Visual Object Learning

Authors: Sven Bambach, David Crandall, Linda Smith, Chen Yu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test this idea, we used head-mounted cameras and eye trackers to record an approximation of the visual stimuli that infants receive while playing with toys in a naturalistic, everyday play environment. We use these data to train state-of-the-art object models deep convolutional neural networks (CNNs) on an object recognition task, and study the performance of the networks as we manipulate various properties of the training dataset.
Researcher Affiliation Academia Sven Bambach1, David J. Crandall1, Linda B. Smith2, Chen Yu2 1School of Informatics, Computing, and Engineering, 2Dept. of Psychological and Brain Sciences Indiana University Bloomington {sbambach, djcran, smith4, chenyu}@iu.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for its described methodology.
Open Datasets No To closely approximate the training data used in toddler object learning, we collected visual data from everyday toy play a context in which toddlers naturally learn about objects and their names. The paper describes collecting its own training data but does not state that it is publicly available or provide access information.
Dataset Splits No We thus test each network on the clean dataset after every epoch and stop training once the accuracy has not increased for at least two epochs, and report the highest overall classification accuracy achieved up to that point. The paper states that models were 'validated and tested' on the same 'clean object dataset', implying no separate validation split from the training data was used, but rather the test set doubled as a validation set for early stopping.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or other processor types used for running its experiments.
Software Dependencies No The paper mentions using VGG16 architecture and YOLO for object detection but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We used a standard stochastic gradient descent optimizer with a learning rate of 0.001, momentum of 0.9, and a batch size of 64 images. All training images were resized to 224 224 pixels, and we did not perform any data augmentation (e.g., left-right reflections or random croppings) since we wanted to use just the data that the infant learners receive. We thus test each network on the clean dataset after every epoch and stop training once the accuracy has not increased for at least two epochs, and report the highest overall classification accuracy achieved up to that point.