Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Authors: Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin Yang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.
Researcher Affiliation Academia Fei Feng University of California, Los Angeles fei.feng@math.ucla.edu Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Wotao Yin University of California, Los Angeles wotaoyin@math.ucla.edu Simon S. Du University of Washington ssdu@cs.washington.edu Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu
Pseudocode Yes Algorithm 1 A Unified Framework for Unsupervised RL; Algorithm 2 Trajectory Sampling Routine TSR (ULO, π, B); Algorithm 3 Fix Label( f[H+1], Z)
Open Source Code Yes Our code is available at https://github.com/Florence Feng/State Decoding.
Open Datasets No We conduct experiments in two environments: Lock Bernoulli and Lock Gaussian. These environments are also studied in Du et al. (2019a), which are designed to be hard for exploration.
Dataset Splits No The paper describes custom-built environments (Lock Bernoulli and Lock Gaussian) for which data is generated episodically, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) or provide a method to reproduce such splits from a static dataset.
Hardware Specification No No specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned in the paper.
Experiment Setup No The paper states: 'Details about hyperparameters and unsupervised learning oracles in URL can be found in Appendix C.', thus deferring the specific experimental setup details to a supplemental appendix rather than providing them in the main text.