reproducibilityindex.ai

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach

Authors: Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems which require deep exploration.
Researcher Affiliation	Collaboration	1Princeton University 2Carnegie Mellon University 3Cornell University 4Google Research.
Pseudocode	Yes	Algorithm 1 Block-structured Representation learning with Interleaved Explore Exploit (BRIEE) ... Algorithm 2 Representation Learning Oracle (REPLEARN) ... Algorithm 3 Least Square Value Iteration LSVI
Open Source Code	Yes	Our code can be find at https://github.com/yudasong/briee.
Open Datasets	No	The paper describes a custom 'diabolical combination lock (comblock) problem' environment used for evaluation. While it mentions the motivation from a prior benchmark (Misra et al., 2019), it does not provide a direct link, DOI, or formal citation for accessing a publicly available dataset used for training.
Dataset Splits	No	The paper discusses 'replay buffers Dh and D h' for data collection and learning, and 'evaluation runs' but does not specify a formal validation split (e.g., 80/10/10 percentages or absolute counts) from a static dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions software like PPO, RND, and LSVI-UCB but does not provide specific version numbers for any of these software components or other ancillary libraries.
Experiment Setup	Yes	We provide the full list of hyperparameters in Table. 2. ... We provide the hyperparameters of BRIEE for the dense reward environment in Table.6 ... We provide the hyperparameters of PPO for the dense reward environment in Table.7.