Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CREW: Facilitating Human-AI Teaming Research
Authors: Lingyu Zhang, Zhengran Ji, Boyuan Chen
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark. We demonstrate CREW s potential by benchmarking real-time human-guided reinforcement learning (RL) algorithms alongside various RL baselines. |
| Researcher Affiliation | Academia | Lingyu Zhang EMAIL Duke University Zhengran Ji EMAIL Duke University Boyuan Chen EMAIL Duke University |
| Pseudocode | Yes | Algorithm 1 The c-Deep TAMER algorithm. |
| Open Source Code | Yes | Our fully open-sourced code base and detailed documentations can be found at https://github.com/ generalroboticslab/CREW.git. |
| Open Datasets | No | The paper describes a platform for conducting human-AI teaming research and collecting data, but it does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for any specific publicly available datasets used or generated by their experiments. |
| Dataset Splits | No | The paper describes evaluation procedures for checkpoints, such as evaluating for "1 game (10 rolls)" or "100 episodes" on "unseen test environments", but it does not provide specific dataset split information (percentages, sample counts, or detailed methodology) for a larger dataset into training, validation, and test sets. |
| Hardware Specification | Yes | All human subject experiments were conducted on desktops with one NVIDIA RTX 4080 GPU. All evaluations were run on a headless server with 8 NVIDIA RTX A6000 and NVIDIA RTX 3090 Ti. |
| Software Dependencies | Yes | The environments of CREW is implemented using Unity 2021.3.24f1, with packages ML Agents 2.3.0-exp.3 Juliani et al. (2018), Netcode for Game Objects 1.3. net and Nakama Unity 3.6.0. nak. Algorithms are developed with torchrl 0.3.0 Bou et al. (2023). |
| Experiment Setup | Yes | The hyperparameter settings for our experiments is summarized in Table. 4. Table 4: Hyperparameters c-Deep TAMER DDPG SAC γ 0.99 0.99 0.99 learning rate 1e-4 1e-4 1e-4 max_grad_norm 0.1 0.1 0.1 batch size 16 240 240 frames per batch 8 240 240 alpha_init 0.1 target entropy -6.0 actor scale_lb 1e-4 # Q value nets 2 2 target update polyak 0.995 0.995 0.995 actor exploration noise N(0, 0.1) N(0, 0.1) credit assignment window bowling[0.2, 4], others[0.2, 1] - |