Learning Continuous Control Policies by Stochastic Value Gradients
Authors: Nicolas Heess, Gregory Wayne, David Silver, Timothy Lillicrap, Tom Erez, Yuval Tassa
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains. |
| Researcher Affiliation | Industry | Google Deep Mind {heess, gregwayne, davidsilver, countzero, tassa, etom}@google.com |
| Pseudocode | Yes | Algorithm 1 SVG(1), Algorithm 2 SVG(1) with Replay |
| Open Source Code | No | The paper provides a link to a video montage ('https://youtu.be/PYd L7bcn_c M.') but no explicit link or statement about open-sourcing the code used for the described methodology. |
| Open Datasets | No | The paper refers to using environments from the 'Mu Jo Co simulator' for experiments but does not provide access information (link, DOI, citation) for specific datasets used for training. |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, or test data splits. It describes continuous interaction with simulation environments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cloud resources. |
| Software Dependencies | No | The paper mentions the 'Mu Jo Co simulator' and 'neural networks' but does not specify any software names with version numbers for reproducibility (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | In all cases, we use generic, 2 hidden-layer neural networks with tanh activation functions to represent models, value functions, and policies. With simulation time step 0.01s |