reproducibilityindex.ai

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that Ar CHer significantly improves efficiency and performance on agent tasks, attaining a sample efficiency of about 100x over existing methods, while also improving with larger model capacity (upto the 7 billion scale). 5. Experiments The goal of our experiments is to evaluate the efficacy of hierarchical RL algorithms derived from Ar CHer.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Google Deepmind.
Pseudocode	Yes	The algorithms derived from the Ar CHer framework so far are summarized in Algorithm 1. Algorithm 1 Ar CHer: Practical Framework
Open Source Code	Yes	1The project page is https://yifeizhou02.github.io/archer.io/ and code can be found at https://github.com/Yifei Zhou02/Ar CHer.
Open Datasets	Yes	Detective Game (Hausknecht et al., 2019)... Twenty Questions and Twenty Questions Subset (Abdulhai et al., 2023)... Guess My City (Abdulhai et al., 2023)... Web Shop (Yao et al., 2023a). We use the official offline dataset provided by Abdulhai et al. (2023) with 100K simulated episodes.
Dataset Splits	No	The paper mentions using "SFT dataset" for initialization and "offline dataset" for some tasks, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	TRC TPU credit donations from Google Cloud, and compute credits from the Center for AI Safety (CAIS).
Software Dependencies	No	The paper mentions specific models used (e.g., 'GPT-2', 'Ro BERTa-base model', 'flan-t5-small') but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup	Yes	Table 2: Hyperparameters for All Experiments. This table lists specific values for 'actor lr', 'critic lr', 'batch size', 'rollout trajectories', 'replay buffer size', 'critic updates per iteration', 'discount', 'polyak alpha', 'PPO epochs', 'GAE lambda', 'clip range'.