Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Generalization and Simplicity in Continuous Control
Authors: Aravind Rajeswaran, Kendall Lowrey, Emanuel V. Todorov, Sham M. Kakade
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of widely studied continuous control tasks, including the gym-v1 benchmarks. The performance of these trained policies are competitive with state of the art results, obtained with more elaborate parameterizations such as fully connected neural networks. Furthermore, the standard training and testing scenarios for these tasks are shown to be very limited and prone to over-๏ฌtting, thus giving rise to only trajectory-centric policies. Training with a diverse initial state distribution induces more global policies with better generalization. This allows for interactive control scenarios where the system recovers from large on-line perturbations; as shown in the supplementary video. |
| Researcher Affiliation | Academia | University of Washington Seattle { aravraj, klowrey, todorov, sham } @ cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Policy Search with Natural Gradient |
| Open Source Code | No | The paper provides a "Project page: https://sites.google.com/view/simple-pol" but does not explicitly state that source code for the described methodology is available there, nor is it a direct link to a code repository. |
| Open Datasets | Yes | As indicated before, we train linear and RBF policies with the natural policy gradient on the popular Open AI gym-v1 benchmark tasks simulated in Mu Jo Co [25]. |
| Dataset Splits | No | The paper discusses using trajectories from a previous iteration to fit the value function and prevent overfitting, but it does not specify a formal validation dataset split (e.g., percentages or counts for a validation set). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions software environments like "Open AI gym-v1" and "Mu Jo Co" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | for all the results reported in this paper, the same ฮด = 0.05 was used. |