Greedy when Sure and Conservative when Uncertain about the Opponents

Authors: Haobo Fu, Ye Tian, Hongxiang Yu, Weiming Liu, Shuang Wu, Jiechao Xiong, Ying Wen, Kai Li, Junliang Xing, Qiang Fu, Wei Yang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental studies on popular benchmarks demonstrate GSCU s superiority over the state-of-the-art methods. The goal of the experimental study is to test the performance of different methods on competing online against unknown and nonstationary opponents. We also validate the effectiveness of each component in GSCU.
Researcher Affiliation Collaboration 1Tencent AI Lab, Shenzhen, China 2Shanghai Jiao Tong University, Shanghai, China 3University of Science and Technology of China, Hefei, China 4Institute of Automation, Chinese Academy of Sciences, Beijing, China 5School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 6Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 Online Bayesian Belief Update and Policy Selection in GSCU
Open Source Code Yes The code is available online at https: //github.com/Ye Tian JHU/GSCU.
Open Datasets Yes We consider two competitive multiagent benchmarks: Kuhn poker (Kuhn, 2016) and gridworld Predator Prey (PP) (Mordatch & Abbeel, 2018).
Dataset Splits No The paper describes 'Training and test protocols' and uses training data, but it does not explicitly provide details about a separate validation dataset split (e.g., percentages, counts, or how it was used for hyperparameter tuning) beyond referring to training and test phases.
Hardware Specification No The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or memory specifications used for running the experiments. It focuses on software parameters and experimental protocols.
Software Dependencies No The paper mentions software components such as 'Adam optimizer', 'PPO', and 'LSTM' in its implementation details. However, it does not provide specific version numbers for any of these software dependencies (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, or specific library versions like 'scikit-learn 0.24.2').
Experiment Setup Yes The hyperparameters used for training on Kuhn poker. Parameter Range Best Shared Learning rate {1e-4, 5e-4} 5e-4 Batch size 1000 Number of PPO update per batch {5, 10} 5 PPO clip ratio 0.2 Training episodes 300000 Discount factor (γ) 0.99 Policy2Emb Learning rate 1e-3 The maximal value for (β) {0.01, 0.1} 0.01 Number of annealing cycles 2