reproducibilityindex.ai

Learning State-Aware Visual Representations from Audible Interactions

Authors: Himangi Mittal, Pedro Morgado, Unnat Jain, Abhinav Gupta

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate these contributions extensively on two large-scale egocentric datasets, EPIC-Kitchens-100 and the recently released Ego4D, and show improvements on several downstream tasks, including action recognition, long-term action anticipation, and object state change classification.
Researcher Affiliation	Collaboration	Himangi Mittal1, Pedro Morgado1, Unnat Jain2, Abhinav Gupta1 1Carnegie Mellon University 2Meta AI Research
Pseudocode	No	The paper does not include explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps formatted like code.
Open Source Code	Yes	Code and pretrained model are available here: https://github.com/Himangi M/Rep LAI
Open Datasets	Yes	We evaluate on two egocentric datasets: EPIC-Kitchens-100 [14] and Ego4D [27].
Dataset Splits	Yes	Video action recognition (AR) on EPIC-Kitchens-100 and Ego4D. Given a short video clip, the task is to classify the verb and noun of the action taking place. This is done using two separate linear classifiers trained for this task. We report the top-1 and top-5 accuracies, following [14] (Tab. 1) and [27] (Tab. 2).
Hardware Specification	Yes	Models are trained with stochastic gradient descent for 100 epochs with a batch size of 128 trained over 4 GTX 1080 Ti GPUs, a learning rate of 0.005 and a momentum of 0.9. For Ego4D, we use a batch size of 512 trained over 8 RTX 2080 Ti GPUs with a learning rate of 0.05.
Software Dependencies	No	The paper mentions software components like 'R(2+1)D video encoder' and '2D CNN' but does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Models are trained with stochastic gradient descent for 100 epochs with a batch size of 128 trained over 4 GTX 1080 Ti GPUs, a learning rate of 0.005 and a momentum of 0.9. For Ego4D, we use a batch size of 512 trained over 8 RTX 2080 Ti GPUs with a learning rate of 0.05. The two loss terms in Eq. 7 are equally weighted with α = 0.5.