Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
Authors: Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin Yang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory. |
| Researcher Affiliation | Academia | Fei Feng University of California, Los Angeles fei.feng@math.ucla.edu Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Wotao Yin University of California, Los Angeles wotaoyin@math.ucla.edu Simon S. Du University of Washington ssdu@cs.washington.edu Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu |
| Pseudocode | Yes | Algorithm 1 A Unified Framework for Unsupervised RL; Algorithm 2 Trajectory Sampling Routine TSR (ULO, π, B); Algorithm 3 Fix Label( f[H+1], Z) |
| Open Source Code | Yes | Our code is available at https://github.com/Florence Feng/State Decoding. |
| Open Datasets | No | We conduct experiments in two environments: Lock Bernoulli and Lock Gaussian. These environments are also studied in Du et al. (2019a), which are designed to be hard for exploration. |
| Dataset Splits | No | The paper describes custom-built environments (Lock Bernoulli and Lock Gaussian) for which data is generated episodically, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) or provide a method to reproduce such splits from a static dataset. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper. |
| Experiment Setup | No | The paper states: 'Details about hyperparameters and unsupervised learning oracles in URL can be found in Appendix C.', thus deferring the specific experimental setup details to a supplemental appendix rather than providing them in the main text. |