reproducibilityindex.ai

Fast Peer Adaptation with Context-aware Exploration

Authors: Long Ma, Yuanfei Wang, Fangwei Zhong, Song-Chun Zhu, Yizhou Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.
Researcher Affiliation	Academia	1Academy for Advanced Interdisciplinary Studies, Peking University 2Nat l Key Laboratory of General Artificial Intelligence, BIGAI&PKU 3Center on Frontiers of Computing Studies, School of Computer Science, Peking University 4School of Intelligence Science and Technology, Peking University 5Inst. for Artificial Intelligence, Peking University 6Nat l Eng. Research Center of Visual Technology, Peking University.
Pseudocode	Yes	Algorithm 1 Training Procedure of PACE
Open Source Code	Yes	1Project page: https://sites.google.com/view/ peer-adaptation
Open Datasets	No	The paper describes generating its own peer policies (e.g., 'we sample 40 P2 policies for training and 10 P2 policies for testing' for Kuhn Poker) and does not provide access information for a publicly available, pre-existing dataset.
Dataset Splits	No	The paper mentions 'training' and 'testing' pools of peer policies but does not explicitly describe a separate 'validation' set or specific splits for validation, such as percentages or sample counts.
Hardware Specification	Yes	The training of PACE takes 12 hours with 80 processes on a single Titan Xp GPU.
Software Dependencies	No	For all baselines and ablations, we use PPO (Schulman et al., 2017; Kostrikov, 2018) as the RL training algorithm. However, specific version numbers for software dependencies like PyTorch, Python, or CUDA are not provided.
Experiment Setup	Yes	Table 4, 5, and 6 list the hyperparameters related to architectures and PPO training for Kuhn Poker, PO-Overcooked, and Predator-Prey-W, respectively. These include Learning Rate, PPO Clip ϵ, Entropy Coefficient, γ, GAE λ, Batch Size, # Update Epochs, # Mini Batches, Gradient Clipping (L2), Activation Function, Actor/Critic Hidden Dims, fθ Hidden Dims, gθ Hidden Dims.