reproducibilityindex.ai

GUIDE: Real-Time Human-Shaped Agents

Authors: Lingyu Zhang, Zhengran Ji, Nicholas Waytowich, Boyuan Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our human study involving 50 subjects offers strong quantitative and qualitative evidence of the effectiveness of our approach.
Researcher Affiliation	Collaboration	Lingyu Zhang1, Zhengran Ji1, Nicholas R Waytowich2, Boyuan Chen1 1Duke University, 2Army Research Laboratory
Pseudocode	No	The paper describes the algorithms and framework but does not include a formal pseudocode block or an explicitly labeled "Algorithm" section.
Open Source Code	Yes	We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility.
Open Datasets	Yes	We conduct our experiments on the CREW [51] platform. ... We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility.
Dataset Splits	Yes	To prevent overfitting, we held out 1 out of 5 trajectories as a validation set.
Hardware Specification	Yes	All human subject experiments are conducted on desktops with one NVIDIA RTX 4080 GPU. All evaluations are run on a headless server with 8 NVIDIA RTX A6000 and NVIDIA RTX 3090 Ti.
Software Dependencies	No	The paper mentions software components like "Adam optimizer", "DDPG", and "SAC" but does not specify version numbers for general software libraries or programming languages (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We used an Adam optimizer with a fixed learning rate of 1e-4 for RL policy training, with a discount factor of γ = 0.99. We applied gradient clipping setting max grad norm to 1. For the learned feedback model, we used the same Adam optimizer with 1e-4 learning rate and employed early stopping based on the loss on held-out trajectories. For Deep TAMER s credit assignment window, we used the same uniform [0.2, 4] distribution as in the original paper. We used a shorter window of [0.2, 1] for Find treasure and Hide-and-Seek. For these more difficult navigation tasks, we stacked three consecutive frames as input.