PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient
Authors: Kaixin Wang, Daquan Zhou, Jiashi Feng, Shie Mannor
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | However, through an extensive empirical study, we unveil that policy regularization and data diversity are what actually matters. (...) To get a clear understanding of what matters in PPG, we conduct a large-scale empirical study on Procgen. Specifically, we focus on the three aspects in Table 1 and run ablation experiments with proper control of other (possibly confounding) hyperparameters. Our study consists of comprehensive experiments covering all 16 Procgen games in both sample efficiency and generalization setups. |
| Researcher Affiliation | Collaboration | Kaixin Wang 1 Daquan Zhou 2 Jiashi Feng 2 Shie Mannor 1 3 (...) 1Faculty of Electrical And Computer Engineering, Technion, Haifa, Israel 2Byte Dance, Singapore 3NVIDIA Research, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 Single-network PPG framework |
| Open Source Code | No | The paper does not contain any statement about making the source code for their methodology publicly available or provide a link to a code repository. |
| Open Datasets | Yes | The large-scale Procgen benchmark (Cobbe et al., 2020) is used as the testbed of our study. |
| Dataset Splits | No | The paper describes training and testing setups for Procgen but does not explicitly provide training/validation/test dataset splits or percentages for reproduction. |
| Hardware Specification | Yes | All experiments are conducted using Intel Xeon Platinum 8260 CPU and NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using 'Py Torch (Paszke et al., 2019) as a deep learning framework' but does not specify a version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | A. Training Details For all experiments, we use the residual convolutional networks in IMPALA (Espeholt et al., 2018) and the Adam optimizer (Kingma & Ba, 2015), following previous practices (Cobbe et al., 2020; 2021). We use Py Torch (Paszke et al., 2019) as a deep learning framework. All experiments are conducted using Intel Xeon Platinum 8260 CPU and NVIDIA V100 GPU. The table below lists the default values of the parameters in our experiments. |