reproducibilityindex.ai

Provably sample-efficient RL with side information about latent dynamics

Authors: Yao Liu, Dipendra Misra, Miro Dudik, Robert E. Schapire

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In synthetic experiments, we verify various properties of our algorithm and compare it with several transfer RL algorithms that require access to full simulators (i.e., those that also simulate observations).
Researcher Affiliation	Industry	Yao Liu Amazon Web Services yaoliuai@amazon.com Dipendra Misra Microsoft Research dipendra.misra@microsoft.com Miroslav Dudík Microsoft Research mdudik@microsoft.com Robert E. Schapire Microsoft Research schapire@microsoft.com
Pseudocode	Yes	Algorithm 1 Robust Dynamic Programming. RDP(M , η) ... Algorithm 2 Transfer from Abstract Simulator using Inverse Dynamics. TASID(M , M , F, η, ϵ, δ)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the described methodology or links to a code repository.
Open Datasets	Yes	We evaluate TASID in the visual Mini Grid environment [Chevalier-Boisvert et al., 2018] with noisy observations.
Dataset Splits	No	The paper mentions running a grid search over hyperparameters and evaluating in simulation environments, but does not specify explicit train/validation/test dataset splits or cross-validation setup for its data.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies	No	The paper mentions algorithms like PPO and environments like MiniGrid, but it does not provide specific version numbers for any software dependencies or libraries required for reproduction.
Experiment Setup	Yes	For baseline algorithms, we run grid search over hyperparameters listed in Table 3 in Appendix D, separately for each environment speciﬁcation (each value of H), and report the best results of PPO(+RND)(+DR). For TASID, we consider only one hyperparameter, the number of training episodes per time step n D, and search over three possible values: 1000, 2500, 10000.