Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach
Authors: Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems which require deep exploration. |
| Researcher Affiliation | Collaboration | 1Princeton University 2Carnegie Mellon University 3Cornell University 4Google Research. |
| Pseudocode | Yes | Algorithm 1 Block-structured Representation learning with Interleaved Explore Exploit (BRIEE) ... Algorithm 2 Representation Learning Oracle (REPLEARN) ... Algorithm 3 Least Square Value Iteration LSVI |
| Open Source Code | Yes | Our code can be find at https://github.com/yudasong/briee. |
| Open Datasets | No | The paper describes a custom 'diabolical combination lock (comblock) problem' environment used for evaluation. While it mentions the motivation from a prior benchmark (Misra et al., 2019), it does not provide a direct link, DOI, or formal citation for accessing a publicly available dataset used for training. |
| Dataset Splits | No | The paper discusses 'replay buffers Dh and D h' for data collection and learning, and 'evaluation runs' but does not specify a formal validation split (e.g., 80/10/10 percentages or absolute counts) from a static dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like PPO, RND, and LSVI-UCB but does not provide specific version numbers for any of these software components or other ancillary libraries. |
| Experiment Setup | Yes | We provide the full list of hyperparameters in Table. 2. ... We provide the hyperparameters of BRIEE for the dense reward environment in Table.6 ... We provide the hyperparameters of PPO for the dense reward environment in Table.7. |