Generalized Weighted Path Consistency for Mastering Atari Games
Authors: Dengwei Zhao, Shikui Tu, Lei Xu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted on the Atari 100k benchmark with 26 games and GW-PCZero achieves 198% mean human performance, higher than the state-of-the-art Efficient Zero s 194%, while consuming only 25% of the computational resources consumed by Efficient Zero. |
| Researcher Affiliation | Academia | Dengwei Zhao Shanghai Jiao Tong University zdwccc@sjtu.edu.cn Shikui Tu Shanghai Jiao Tong University tushikui@sjtu.edu.cn Lei Xu Shanghai Jiao Tong University Guangdong Institute of Intelligence Science and Technology leixu@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Sample Preparation for GW-PCZero; Algorithm 2: Weighted PC target t P C estimation |
| Open Source Code | Yes | 1The source code is available at https://github.com/CMACH508/GW_PCZero. |
| Open Datasets | Yes | Experiments are conducted on the Atari 100k benchmark with 26 games to evaluate GW-PCZero in diverse environments. |
| Dataset Splits | No | The paper refers to the 'Atari 100k benchmark' and '100k interaction steps' but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | Yes | Experiments are conducted on 4 NVIDIA Tesla A100 GPUs with 16 CPU cores. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set cb = 1.0 and ca = 0.1 in Eq. (16). Totally 32 of different random seeds are used. Other hyperparameter settings are the same as Efficient Zero, as summarized in Appendix 5. |