Continual Learning of Control Primitives : Skill Discovery via Reset-Games
Authors: Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we aim to experimentally answer the following questions: (1) How does our approach compare to prior methods in the reset free domain? (2) How do the skills learned by our approach compare to prior work? Specifically, can we learn better hierarchical controllers using the primitives learned by our approach (LSR)? We first describe our experimental setup, evaluation metrics, and the prior methods to which we will compare. |
| Researcher Affiliation | Academia | Kelvin Xu 1, Siddharth Verma 1, Chelsea Finn2, Sergey Levine1 1 UC Berkeley, 2 Stanford University |
| Pseudocode | Yes | Algorithm 1 Learning Skillful Resets (LSR) |
| Open Source Code | Yes | 2code is available at https://github.com/siddharthverma314/adversarial.git |
| Open Datasets | Yes | To study reset-free learning, we use the three-fingered hand repositioning task proposed by Zhu et al. [54]... To evaluate our second experimental hypothesis, we evaluate skills acquired with reset-free learning for an ant locomotion task... |
| Dataset Splits | No | The paper describes experimental setups and evaluation metrics for different environments (DClaw and Ant), but does not explicitly provide specific percentages or counts for training, validation, and test data splits. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory specifications) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions using 'soft actor-critic (SAC) [22]' and 'random network distillation [7]' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | We leave detailed discussion of hyperparameters and environment parameters to the Appendix 6. This indicates that these details are not provided in the main body of the paper provided. |