ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that Ar CHer significantly improves efficiency and performance on agent tasks, attaining a sample efficiency of about 100x over existing methods, while also improving with larger model capacity (upto the 7 billion scale). 5. Experiments The goal of our experiments is to evaluate the efficacy of hierarchical RL algorithms derived from Ar CHer.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Google Deepmind.
Pseudocode Yes The algorithms derived from the Ar CHer framework so far are summarized in Algorithm 1. Algorithm 1 Ar CHer: Practical Framework
Open Source Code Yes 1The project page is https://yifeizhou02.github.io/archer.io/ and code can be found at https://github.com/Yifei Zhou02/Ar CHer.
Open Datasets Yes Detective Game (Hausknecht et al., 2019)... Twenty Questions and Twenty Questions Subset (Abdulhai et al., 2023)... Guess My City (Abdulhai et al., 2023)... Web Shop (Yao et al., 2023a). We use the official offline dataset provided by Abdulhai et al. (2023) with 100K simulated episodes.
Dataset Splits No The paper mentions using "SFT dataset" for initialization and "offline dataset" for some tasks, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes TRC TPU credit donations from Google Cloud, and compute credits from the Center for AI Safety (CAIS).
Software Dependencies No The paper mentions specific models used (e.g., 'GPT-2', 'Ro BERTa-base model', 'flan-t5-small') but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes Table 2: Hyperparameters for All Experiments. This table lists specific values for 'actor lr', 'critic lr', 'batch size', 'rollout trajectories', 'replay buffer size', 'critic updates per iteration', 'discount', 'polyak alpha', 'PPO epochs', 'GAE lambda', 'clip range'.