Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Smoothed Action Value Functions for Learning Gaussian Policies
Authors: Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a number of evaluations of Smoothie to compare to DDPG. We choose DDPG as a baseline because it (1) utilizes gradient information of a Q-value approximator, much like the proposed algorithm; and (2) is a standard algorithm well-known to have achieve good, sample-ef๏ฌcient performance on continuous control benchmarks. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Department of Computing Science, University of Alberta. |
| Pseudocode | Yes | Algorithm 1 Smoothie |
| Open Source Code | No | The paper does not include an unambiguous statement of code release or a direct link to a source-code repository for the described methodology. |
| Open Datasets | Yes | We consider standard continuous control benchmarks available on Open AI Gym (Brockman et al., 2016) utilizing the Mu Jo Co environment (Todorov et al., 2012). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like OpenAI Gym and MuJoCo but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For each task we performed a hyperparameter search over actor learning rate, critic learning rate and reward scale... Additional implementation details are provided in the Appendix. |