reproducibilityindex.ai

Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

Authors: Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our automated NTT on a navigation task in a complex 3D environment. We investigate six classifcation models to shed light on the types of architectures best suited to this task, and validate them against data collected through a human NTT. Our best models achieve high accuracy when distinguishing true human and agent behavior.
Researcher Affiliation	Industry	1Microsoft Research, Cambridge, UK 2Microsoft Research, New York, NY, USA 3Ninja Theory, Cambridge, UK.
Pseudocode	No	The paper includes figures illustrating model architectures but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	CNN. Convolutional models are applied to image input (visual, top-down and bar-code observations). We use a VGG-16 (Simonyan & Zisserman, 2014) pre-trained on Imagenet (Deng et al., 2009) to extract visual features.
Dataset Splits	Yes	A total of 140 human recordings were collected, which we split into 100 videos (from 4 players) for classifer training and validation, and 40 (3 remaining players) for testing. Training and hyperparameter tuning was performed using 5-fold cross validation on trajectories generated by agent checkpoints and human players that were fully sepa rate from those that generated test data.
Hardware Specification	No	The paper mentions that “recorded replays on machines that met the system requirements of the experimental game build, including GPU rendering support” and agent training on “60 parallel game instances”, but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions software like “Tensorflow (Abadi et al., 2015)” and “PPO (Schulman et al., 2017)” and “VGG-16 (Simonyan & Zisserman, 2014)” but does not specify exact version numbers for these software libraries or models used in their experiments.
Experiment Setup	Yes	The agents were trained using PPO (Schulman et al., 2017)... The reward signal during training consists of a dense reward for minimizing the distance..., a +1 reward for reaching the target, and a -1 penalty for dying... a small per-step penalty of 0.01 encourages effcient task completion. Episodes end when agents reach the goal radius or after 3,000 game ticks... Training and hyperparameter tuning was performed using 5-fold cross validation... See Appendix A.1 for training details and hyperparameters.