reproducibilityindex.ai

Semantic Visual Navigation by Watching YouTube Videos

Authors: Matthew Chang, Arjun Gupta, Saurabh Gupta

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show results on the Object Goal task in novel environments [3]. Our experiments test the extent to which we are able to learn semantic cues for navigation by watching videos, and how this compares to alternate techniques for learning such cues via direct interaction.
Researcher Affiliation	Academia	Matthew Chang Arjun Gupta Saurabh Gupta University of Illinois at Urbana-Champaign {mc48, arjung2, saurabhg}@illinois.edu
Pseudocode	No	The paper describes procedural steps and equations (e.g., Q-learning form), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Project website with code, models, and videos: https://matthewchang.github.io/value-learning-from-videos/.
Open Datasets	Yes	We use the Habitat simulator [52] with the Gibson environments [68] (100 training environments from the medium split, and the 5 validation environments from the tiny split).
Dataset Splits	Yes	We split the 105 environments into three sets: Etrain, Etest, and Evideo with 15, 5, and 85 environments respectively.
Hardware Specification	No	The paper describes the robot model and its sensors, but it does not specify the hardware (e.g., GPU, CPU models, or memory) used for training the models or running the simulations.
Software Dependencies	No	The paper mentions various software components and algorithms used, such as 'Res Net-18', 'Mask RCNN', 'Habitat simulator', 'PPO', 'Double DQN', and 'Adam', but it does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	Inverse model ψ processes RGB images It and It+1 using a Res Net-18 model [29], stacks the resulting convolutional feature maps, and further processes using 2 convolutional layers, and 2 fully connected layers to obtain the ﬁnal prediction for the intervening action. We use Double DQN ... with Adam [34] for training the Q-networks, and set γ = 0.99. As our reward is bounded between 0 and 1, clipping target value between 0 and 1 led to more stable training.