reproducibilityindex.ai

Smoothed Action Value Functions for Learning Gaussian Policies

Authors: Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a number of evaluations of Smoothie to compare to DDPG. We choose DDPG as a baseline because it (1) utilizes gradient information of a Q-value approximator, much like the proposed algorithm; and (2) is a standard algorithm well-known to have achieve good, sample-efﬁcient performance on continuous control benchmarks.
Researcher Affiliation	Collaboration	1Google Brain 2Department of Computing Science, University of Alberta.
Pseudocode	Yes	Algorithm 1 Smoothie
Open Source Code	No	The paper does not include an unambiguous statement of code release or a direct link to a source-code repository for the described methodology.
Open Datasets	Yes	We consider standard continuous control benchmarks available on Open AI Gym (Brockman et al., 2016) utilizing the Mu Jo Co environment (Todorov et al., 2012).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like OpenAI Gym and MuJoCo but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For each task we performed a hyperparameter search over actor learning rate, critic learning rate and reward scale... Additional implementation details are provided in the Appendix.