Greedy when Sure and Conservative when Uncertain about the Opponents
Authors: Haobo Fu, Ye Tian, Hongxiang Yu, Weiming Liu, Shuang Wu, Jiechao Xiong, Ying Wen, Kai Li, Junliang Xing, Qiang Fu, Wei Yang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental studies on popular benchmarks demonstrate GSCU s superiority over the state-of-the-art methods. The goal of the experimental study is to test the performance of different methods on competing online against unknown and nonstationary opponents. We also validate the effectiveness of each component in GSCU. |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab, Shenzhen, China 2Shanghai Jiao Tong University, Shanghai, China 3University of Science and Technology of China, Hefei, China 4Institute of Automation, Chinese Academy of Sciences, Beijing, China 5School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 6Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Online Bayesian Belief Update and Policy Selection in GSCU |
| Open Source Code | Yes | The code is available online at https: //github.com/Ye Tian JHU/GSCU. |
| Open Datasets | Yes | We consider two competitive multiagent benchmarks: Kuhn poker (Kuhn, 2016) and gridworld Predator Prey (PP) (Mordatch & Abbeel, 2018). |
| Dataset Splits | No | The paper describes 'Training and test protocols' and uses training data, but it does not explicitly provide details about a separate validation dataset split (e.g., percentages, counts, or how it was used for hyperparameter tuning) beyond referring to training and test phases. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or memory specifications used for running the experiments. It focuses on software parameters and experimental protocols. |
| Software Dependencies | No | The paper mentions software components such as 'Adam optimizer', 'PPO', and 'LSTM' in its implementation details. However, it does not provide specific version numbers for any of these software dependencies (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, or specific library versions like 'scikit-learn 0.24.2'). |
| Experiment Setup | Yes | The hyperparameters used for training on Kuhn poker. Parameter Range Best Shared Learning rate {1e-4, 5e-4} 5e-4 Batch size 1000 Number of PPO update per batch {5, 10} 5 PPO clip ratio 0.2 Training episodes 300000 Discount factor (γ) 0.99 Policy2Emb Learning rate 1e-3 The maximal value for (β) {0.01, 0.1} 0.01 Number of annealing cycles 2 |