Characterizing the Gap Between Actor-Critic and Policy Gradient

Authors: Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods. We present empirical results that show these modifications can improve sample efficiency and final performance in both a tabular domain as well as continuous control tasks which require neural networks to approximate the actor and critic.
Researcher Affiliation Collaboration 1Department of Computing Science, University of Alberta, Edmonton, Canada 2Stanford University 3Google Brain.
Pseudocode Yes Appendix D.3 Pseudocode and Hyperparameters
Open Source Code No The paper does not explicitly state that the source code for their method is released or provide a link to a repository.
Open Datasets Yes We conduct experiments within both the Four Room domain, an illustrative discrete action space environment (see Appendix D), and three continuous control environments: Pendulum-v0, Reacher-v2, and Half Cheetah-v2. The environments Reacher-v2 and Half Cheetah-v2 use the Mu Jo Co physics engine (Todorov et al., 2012).
Dataset Splits No The paper mentions running experiments with different random seeds (e.g., '3 runs', '5 runs') but does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning that 'continuous control tasks which require neural networks to approximate the actor and critic'.
Software Dependencies No The paper mentions using the 'Mu Jo Co physics engine (Todorov et al., 2012)' and modifying 'Soft Actor-Critic (SAC) (Haarnoja et al., 2018a)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Additional training details, including hyper-parameter settings and pseudocode, and additional experimental results are in Appendix D. For continuous environments, we use a different set of hyperparameters, which are listed in Table 2.