GUIDE: Real-Time Human-Shaped Agents

Authors: Lingyu Zhang, Zhengran Ji, Nicholas Waytowich, Boyuan Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our human study involving 50 subjects offers strong quantitative and qualitative evidence of the effectiveness of our approach.
Researcher Affiliation Collaboration Lingyu Zhang1, Zhengran Ji1, Nicholas R Waytowich2, Boyuan Chen1 1Duke University, 2Army Research Laboratory
Pseudocode No The paper describes the algorithms and framework but does not include a formal pseudocode block or an explicitly labeled "Algorithm" section.
Open Source Code Yes We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility.
Open Datasets Yes We conduct our experiments on the CREW [51] platform. ... We will also open-source the entire code base, including algorithms and task environments for the broader community for full reproducibility.
Dataset Splits Yes To prevent overfitting, we held out 1 out of 5 trajectories as a validation set.
Hardware Specification Yes All human subject experiments are conducted on desktops with one NVIDIA RTX 4080 GPU. All evaluations are run on a headless server with 8 NVIDIA RTX A6000 and NVIDIA RTX 3090 Ti.
Software Dependencies No The paper mentions software components like "Adam optimizer", "DDPG", and "SAC" but does not specify version numbers for general software libraries or programming languages (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We used an Adam optimizer with a fixed learning rate of 1e-4 for RL policy training, with a discount factor of γ = 0.99. We applied gradient clipping setting max grad norm to 1. For the learned feedback model, we used the same Adam optimizer with 1e-4 learning rate and employed early stopping based on the loss on held-out trajectories. For Deep TAMER s credit assignment window, we used the same uniform [0.2, 4] distribution as in the original paper. We used a shorter window of [0.2, 1] for Find treasure and Hide-and-Seek. For these more difficult navigation tasks, we stacked three consecutive frames as input.