Generalized Data Distribution Iteration
Authors: Jiajun Fan, Changnan Xiao
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also demonstrate our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames. Our performance is comparable to Agent57 s while we consume 500 times less data. |
| Researcher Affiliation | Collaboration | 1Tsinghua Shenzhen International Graduate School, Tsinghua University, Beijing, China 2Byte Dance, Beijing, China. |
| Pseudocode | Yes | Algorithm 2 GDI-I3 Algorithm. Algorithm 3 GDI-H3 Algorithm. Algorithm 4 Bandits Controller |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is open-source or publicly available. |
| Open Datasets | Yes | We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al., 2013, ALE) |
| Dataset Splits | No | The paper discusses evaluation using multiple environments and seeds, but it does not specify traditional training/validation/test dataset splits with percentages or counts, as is common in supervised learning. The evaluation is typical for reinforcement learning environments where agents interact directly with the game. |
| Hardware Specification | Yes | all the experiment is accomplished using a single CPU with 92 cores and a single Tesla-V100-SXM2-32GB GPU. |
| Software Dependencies | No | The paper mentions software components like 'LSTM' and 'Adam', but it does not provide specific version numbers for any programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Table 5. Hyperparameters for Atari experiments. Parameter Value Num. Frames 200M (2E+8) Replay 2 Num. Environments 160 Batch size 64 Discount (γ) 0.997 Optimizer Adam Learning Rate 5e-4 |