Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GUIDE: Real-Time Human-Shaped Agents
Authors: Lingyu Zhang, Zhengran Ji, Nicholas Waytowich, Boyuan Chen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our human study involving 50 subjects offers strong quantitative and qualitative evidence of the effectiveness of our approach. |
| Researcher Affiliation | Collaboration | Lingyu Zhang1, Zhengran Ji1, Nicholas R Waytowich2, Boyuan Chen1 1Duke University, 2Army Research Laboratory |
| Pseudocode | No | The paper describes the algorithms and framework but does not include a formal pseudocode block or an explicitly labeled "Algorithm" section. |
| Open Source Code | Yes | We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility. |
| Open Datasets | Yes | We conduct our experiments on the CREW [51] platform. ... We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility. |
| Dataset Splits | Yes | To prevent overfitting, we held out 1 out of 5 trajectories as a validation set. |
| Hardware Specification | Yes | All human subject experiments are conducted on desktops with one NVIDIA RTX 4080 GPU. All evaluations are run on a headless server with 8 NVIDIA RTX A6000 and NVIDIA RTX 3090 Ti. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer", "DDPG", and "SAC" but does not specify version numbers for general software libraries or programming languages (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We used an Adam optimizer with a fixed learning rate of 1e-4 for RL policy training, with a discount factor of γ = 0.99. We applied gradient clipping setting max grad norm to 1. For the learned feedback model, we used the same Adam optimizer with 1e-4 learning rate and employed early stopping based on the loss on held-out trajectories. For Deep TAMER s credit assignment window, we used the same uniform [0.2, 4] distribution as in the original paper. We used a shorter window of [0.2, 1] for Find treasure and Hide-and-Seek. For these more difficult navigation tasks, we stacked three consecutive frames as input. |