Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CREW: Facilitating Human-AI Teaming Research

Authors: Lingyu Zhang, Zhengran Ji, Boyuan Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark. We demonstrate CREW s potential by benchmarking real-time human-guided reinforcement learning (RL) algorithms alongside various RL baselines.
Researcher Affiliation Academia Lingyu Zhang EMAIL Duke University Zhengran Ji EMAIL Duke University Boyuan Chen EMAIL Duke University
Pseudocode Yes Algorithm 1 The c-Deep TAMER algorithm.
Open Source Code Yes Our fully open-sourced code base and detailed documentations can be found at https://github.com/ generalroboticslab/CREW.git.
Open Datasets No The paper describes a platform for conducting human-AI teaming research and collecting data, but it does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for any specific publicly available datasets used or generated by their experiments.
Dataset Splits No The paper describes evaluation procedures for checkpoints, such as evaluating for "1 game (10 rolls)" or "100 episodes" on "unseen test environments", but it does not provide specific dataset split information (percentages, sample counts, or detailed methodology) for a larger dataset into training, validation, and test sets.
Hardware Specification Yes All human subject experiments were conducted on desktops with one NVIDIA RTX 4080 GPU. All evaluations were run on a headless server with 8 NVIDIA RTX A6000 and NVIDIA RTX 3090 Ti.
Software Dependencies Yes The environments of CREW is implemented using Unity 2021.3.24f1, with packages ML Agents 2.3.0-exp.3 Juliani et al. (2018), Netcode for Game Objects 1.3. net and Nakama Unity 3.6.0. nak. Algorithms are developed with torchrl 0.3.0 Bou et al. (2023).
Experiment Setup Yes The hyperparameter settings for our experiments is summarized in Table. 4. Table 4: Hyperparameters c-Deep TAMER DDPG SAC γ 0.99 0.99 0.99 learning rate 1e-4 1e-4 1e-4 max_grad_norm 0.1 0.1 0.1 batch size 16 240 240 frames per batch 8 240 240 alpha_init 0.1 target entropy -6.0 actor scale_lb 1e-4 # Q value nets 2 2 target update polyak 0.995 0.995 0.995 actor exploration noise N(0, 0.1) N(0, 0.1) credit assignment window bowling[0.2, 4], others[0.2, 1] -