Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Shaping embodied agent behavior with activity-context priors from egocentric video
Authors: Tushar Nagarajan, Kristen Grauman
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate how well our agents learn complex interaction tasks using our human video based reward. ... Table 1 shows success rates across all tasks. ... Fig. 5 shows consolidated results across all tasks, treating each episode of each task as an individual instance that can be successful or not. |
| Researcher Affiliation | Collaboration | Tushar Nagarajan UT Austin and Facebook AI Research EMAIL Kristen Grauman UT Austin and Facebook AI Research EMAIL |
| Pseudocode | Yes | See Supp. for pseudo-code of the memory update and reward allocation step. |
| Open Source Code | Yes | Project page: http://vision.cs.utexas.edu/projects/ego-rewards/ |
| Open Datasets | Yes | To train policies, we use the AI2-i THOR [33] simulator... To learn activity-context priors, we use all 55 hours of video from EPIC-Kitchens [13], which contains egocentric videos of daily, unscripted kitchen activities in a variety of homes. It consists of 40k video clips annotated for interactions spanning 352 objects (OV ) and 125 actions. Note that we use clip boundaries to segment actions, but we do not use the action labels in our method. |
| Dataset Splits | Yes | We use all 30 kitchen scenes from AI2-i THOR, split into training (25) and testing (5) sets. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU models, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using ResNet-18 [29] encoder, LSTM, MLP, and DDPPO [64] for training, and Glove [45] word embedding space. However, it does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | We train our agents using DDPPO [64] for 5M steps, with rollouts of T = 256 time steps. Our model and all baselines use visual encoders from agents that are pre-trained for interaction exploration [40] for 5M steps, which we find benefits all approaches. See Fig. 3 and Supp. for architecture, hyperparameter and training details. |