You Only Live Once: Single-Life Reinforcement Learning
Authors: Annie Chen, Archit Sharma, Sergey Levine, Chelsea Finn
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on several singlelife continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel states. |
| Researcher Affiliation | Academia | Annie S. Chen1, Archit Sharma1, Sergey Levine2, Chelsea Finn1 Stanford University1, UC Berkeley2 |
| Pseudocode | Yes | Algorithm 1 Q-WEIGHTED ADVERSARIAL LEARNING (QWALE) |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Included in the supplemental material. |
| Open Datasets | No | For all four environments, we evaluate SLRL using data collected through RL as our prior data. More specifically, we run SAC in the source MDP in the standard episodic RL setting for K steps and take the last 50,000 transitions as the prior data. |
| Dataset Splits | No | The paper describes prior data and online experience but does not specify explicit training/validation/test splits of a static dataset. |
| Hardware Specification | No | The main text does not include specific hardware details. The ethics checklist states that compute resources are 'Included in the supplemental material' but does not specify them in the paper itself. |
| Software Dependencies | No | The paper refers to algorithms and methods like SAC and GAIL, but does not provide specific version numbers for any software libraries, frameworks, or dependencies. |
| Experiment Setup | No | For details such as network architecture and hyperparameters, see Appendix A. |