Characterizing the Gap Between Actor-Critic and Policy Gradient
Authors: Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods. We present empirical results that show these modifications can improve sample efficiency and final performance in both a tabular domain as well as continuous control tasks which require neural networks to approximate the actor and critic. |
| Researcher Affiliation | Collaboration | 1Department of Computing Science, University of Alberta, Edmonton, Canada 2Stanford University 3Google Brain. |
| Pseudocode | Yes | Appendix D.3 Pseudocode and Hyperparameters |
| Open Source Code | No | The paper does not explicitly state that the source code for their method is released or provide a link to a repository. |
| Open Datasets | Yes | We conduct experiments within both the Four Room domain, an illustrative discrete action space environment (see Appendix D), and three continuous control environments: Pendulum-v0, Reacher-v2, and Half Cheetah-v2. The environments Reacher-v2 and Half Cheetah-v2 use the Mu Jo Co physics engine (Todorov et al., 2012). |
| Dataset Splits | No | The paper mentions running experiments with different random seeds (e.g., '3 runs', '5 runs') but does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning that 'continuous control tasks which require neural networks to approximate the actor and critic'. |
| Software Dependencies | No | The paper mentions using the 'Mu Jo Co physics engine (Todorov et al., 2012)' and modifying 'Soft Actor-Critic (SAC) (Haarnoja et al., 2018a)' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Additional training details, including hyper-parameter settings and pseudocode, and additional experimental results are in Appendix D. For continuous environments, we use a different set of hyperparameters, which are listed in Table 2. |