Autonomous Reinforcement Learning via Subgoal Curricula
Authors: Archit Sharma, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark Va PRL on several robotic control tasks in the persistent RL setting against state-of-the-art methods, which either simulate the initial state distribution by learning a reset controller, or incrementally grow the state-space from which the given task can be solved. Our experiments indicate that using a tailored curriculum generated by Va PRL can be up to 30% more sample-efficient in acquiring task behaviors compared to these prior methods. |
| Researcher Affiliation | Collaboration | Archit Sharma , Abhishek Gupta# , Sergey Levine# , Karol Hausman , Chelsea Finn Stanford University, Google Brain, # UC Berkeley {architsh,cbfinn}@stanford.edu {abhishekunique,slevine,karolhausman}@google.com |
| Pseudocode | Yes | Algorithm 1: Value-Accelerated Persistent Reinforcement Learning (Va PRL) |
| Open Source Code | No | In the ethics checklist, the authors state: '[No], we will release the code and the environments upon publication.' The provided URL is a project page, not a direct code repository. |
| Open Datasets | No | The paper describes using simulated environments (table-top rearrangement, sawyer door closing, hand manipulation) and providing 'a small set of trajectories' or 'demonstrations' to the algorithms. It does not provide concrete access information (link, DOI, formal citation) to a publicly available dataset used for training. While it references 'Meta-world', it does not specify how the data used from it is publicly accessed. |
| Dataset Splits | No | The paper discusses 'training environment MT' and 'evaluation environment ME' but does not explicitly mention or provide details about a 'validation' dataset or specific training/validation/test data splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The main body of the paper does not specify the hardware used (e.g., GPU models, CPU types, memory). While the ethics checklist indicates this information is in the Appendix, the Appendix itself is not provided in the analyzed text. |
| Software Dependencies | No | The paper mentions using 'soft actor-critic [17] as the base RL algorithm' and refers to 'Tensorflow agents [18]', but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper states: 'Further details about problem setup, demonstrations, implementation, hyperparameters and evaluation metrics can be found in the Appendix.' This indicates that specific experimental setup details, such as hyperparameters, are not present in the main text provided. |