ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that Ar CHer significantly improves efficiency and performance on agent tasks, attaining a sample efficiency of about 100x over existing methods, while also improving with larger model capacity (upto the 7 billion scale). 5. Experiments The goal of our experiments is to evaluate the efficacy of hierarchical RL algorithms derived from Ar CHer. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Deepmind. |
| Pseudocode | Yes | The algorithms derived from the Ar CHer framework so far are summarized in Algorithm 1. Algorithm 1 Ar CHer: Practical Framework |
| Open Source Code | Yes | 1The project page is https://yifeizhou02.github.io/archer.io/ and code can be found at https://github.com/Yifei Zhou02/Ar CHer. |
| Open Datasets | Yes | Detective Game (Hausknecht et al., 2019)... Twenty Questions and Twenty Questions Subset (Abdulhai et al., 2023)... Guess My City (Abdulhai et al., 2023)... Web Shop (Yao et al., 2023a). We use the official offline dataset provided by Abdulhai et al. (2023) with 100K simulated episodes. |
| Dataset Splits | No | The paper mentions using "SFT dataset" for initialization and "offline dataset" for some tasks, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | TRC TPU credit donations from Google Cloud, and compute credits from the Center for AI Safety (CAIS). |
| Software Dependencies | No | The paper mentions specific models used (e.g., 'GPT-2', 'Ro BERTa-base model', 'flan-t5-small') but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | Table 2: Hyperparameters for All Experiments. This table lists specific values for 'actor lr', 'critic lr', 'batch size', 'rollout trajectories', 'replay buffer size', 'critic updates per iteration', 'discount', 'polyak alpha', 'PPO epochs', 'GAE lambda', 'clip range'. |