reproducibilityindex.ai

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Authors: Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efﬁcient than standard reinforcement learning baselines. (Abstract) and 7. Proof of Concept Experiments (Section title). Results. Figure 2b reports the minimum number of episodes needed to achieve a mean return of V ( ?)/2 = 2.0. We run each algorithm 5 times with different seeds and for a maximum of 10 million episodes, and we report the median performance. (Section 7, Results)
Researcher Affiliation	Industry	Dipendra Misra 1 Mikael Henaff 1 Akshay Krishnamurthy 1 John Langford 1 1Microsoft Research, New York, NY.
Pseudocode	Yes	Algorithm 1 HOMER( , F, N, , , δ). Reinforcement and abstraction learning in a Block MDP. and Algorithm 2 PSDP( 1:h, R0, h, , n). Optimizing reward function R0 given policy covers 1:h
Open Source Code	Yes	Reproducibility. Code and models can be found at https://github.com/cereb-rl. (Section 7, Reproducibility)
Open Datasets	No	The paper uses a custom-designed environment called 'diabolical combination lock' and does not provide concrete access information (link, DOI, or formal citation) for a publicly available or open dataset.
Dataset Splits	No	The paper evaluates performance based on 'episodes' in a reinforcement learning environment, which is not a static dataset. It does not mention explicit training/validation/test dataset splits in the traditional sense of supervised learning.
Hardware Specification	No	The paper mentions 'Microsoft Philly Team for providing us with computational resources' in the acknowledgements, but does not provide specific details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions software like 'Py Torch', 'PPO', 'DQN', and 'OpenAI baselines', and describes architectural components like 'Re Lu non-linearity' and 'Gumbel Softmax', but it does not specify any version numbers for these software components or libraries.
Experiment Setup	No	The paper discusses the representation of policies and state abstraction functions, and mentions varying 'N' (abstract state space size) in experiments, along with H=100 and K=10. It refers to Appendix H for 'full details of the model, optimization and empirical changes', but no concrete hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations are provided in the main text.