Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Authors: Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines. (Abstract) and 7. Proof of Concept Experiments (Section title). Results. Figure 2b reports the minimum number of episodes needed to achieve a mean return of V ( ?)/2 = 2.0. We run each algorithm 5 times with different seeds and for a maximum of 10 million episodes, and we report the median performance. (Section 7, Results)
Researcher Affiliation Industry Dipendra Misra 1 Mikael Henaff 1 Akshay Krishnamurthy 1 John Langford 1 1Microsoft Research, New York, NY.
Pseudocode Yes Algorithm 1 HOMER( , F, N, , , δ). Reinforcement and abstraction learning in a Block MDP. and Algorithm 2 PSDP( 1:h, R0, h, , n). Optimizing reward function R0 given policy covers 1:h
Open Source Code Yes Reproducibility. Code and models can be found at https://github.com/cereb-rl. (Section 7, Reproducibility)
Open Datasets No The paper uses a custom-designed environment called 'diabolical combination lock' and does not provide concrete access information (link, DOI, or formal citation) for a publicly available or open dataset.
Dataset Splits No The paper evaluates performance based on 'episodes' in a reinforcement learning environment, which is not a static dataset. It does not mention explicit training/validation/test dataset splits in the traditional sense of supervised learning.
Hardware Specification No The paper mentions 'Microsoft Philly Team for providing us with computational resources' in the acknowledgements, but does not provide specific details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions software like 'Py Torch', 'PPO', 'DQN', and 'OpenAI baselines', and describes architectural components like 'Re Lu non-linearity' and 'Gumbel Softmax', but it does not specify any version numbers for these software components or libraries.
Experiment Setup No The paper discusses the representation of policies and state abstraction functions, and mentions varying 'N' (abstract state space size) in experiments, along with H=100 and K=10. It refers to Appendix H for 'full details of the model, optimization and empirical changes', but no concrete hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations are provided in the main text.