reproducibilityindex.ai

Characterizing the Gap Between Actor-Critic and Policy Gradient

Authors: Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efﬁciency and ﬁnal performance of existing AC methods. We present empirical results that show these modiﬁcations can improve sample efﬁciency and ﬁnal performance in both a tabular domain as well as continuous control tasks which require neural networks to approximate the actor and critic.
Researcher Affiliation	Collaboration	1Department of Computing Science, University of Alberta, Edmonton, Canada 2Stanford University 3Google Brain.
Pseudocode	Yes	Appendix D.3 Pseudocode and Hyperparameters
Open Source Code	No	The paper does not explicitly state that the source code for their method is released or provide a link to a repository.
Open Datasets	Yes	We conduct experiments within both the Four Room domain, an illustrative discrete action space environment (see Appendix D), and three continuous control environments: Pendulum-v0, Reacher-v2, and Half Cheetah-v2. The environments Reacher-v2 and Half Cheetah-v2 use the Mu Jo Co physics engine (Todorov et al., 2012).
Dataset Splits	No	The paper mentions running experiments with different random seeds (e.g., '3 runs', '5 runs') but does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning that 'continuous control tasks which require neural networks to approximate the actor and critic'.
Software Dependencies	No	The paper mentions using the 'Mu Jo Co physics engine (Todorov et al., 2012)' and modifying 'Soft Actor-Critic (SAC) (Haarnoja et al., 2018a)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Additional training details, including hyper-parameter settings and pseudocode, and additional experimental results are in Appendix D. For continuous environments, we use a different set of hyperparameters, which are listed in Table 2.