Smoothed Action Value Functions for Learning Gaussian Policies
Authors: Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a number of evaluations of Smoothie to compare to DDPG. We choose DDPG as a baseline because it (1) utilizes gradient information of a Q-value approximator, much like the proposed algorithm; and (2) is a standard algorithm well-known to have achieve good, sample-efficient performance on continuous control benchmarks. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Department of Computing Science, University of Alberta. |
| Pseudocode | Yes | Algorithm 1 Smoothie |
| Open Source Code | No | The paper does not include an unambiguous statement of code release or a direct link to a source-code repository for the described methodology. |
| Open Datasets | Yes | We consider standard continuous control benchmarks available on Open AI Gym (Brockman et al., 2016) utilizing the Mu Jo Co environment (Todorov et al., 2012). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like OpenAI Gym and MuJoCo but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For each task we performed a hyperparameter search over actor learning rate, critic learning rate and reward scale... Additional implementation details are provided in the Appendix. |