Smoothed Action Value Functions for Learning Gaussian Policies

Authors: Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a number of evaluations of Smoothie to compare to DDPG. We choose DDPG as a baseline because it (1) utilizes gradient information of a Q-value approximator, much like the proposed algorithm; and (2) is a standard algorithm well-known to have achieve good, sample-efficient performance on continuous control benchmarks.
Researcher Affiliation Collaboration 1Google Brain 2Department of Computing Science, University of Alberta.
Pseudocode Yes Algorithm 1 Smoothie
Open Source Code No The paper does not include an unambiguous statement of code release or a direct link to a source-code repository for the described methodology.
Open Datasets Yes We consider standard continuous control benchmarks available on Open AI Gym (Brockman et al., 2016) utilizing the Mu Jo Co environment (Todorov et al., 2012).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like OpenAI Gym and MuJoCo but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For each task we performed a hyperparameter search over actor learning rate, critic learning rate and reward scale... Additional implementation details are provided in the Appendix.