reproducibilityindex.ai

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Authors: Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on several stochastic control tasks conﬁrm the efﬁcacy of our low-rank algorithms.
Researcher Affiliation	Academia	Devavrat Shah EECS, MIT devavrat@mit.edu Dogyoon Song EECS, MIT dgsong@mit.edu Zhi Xu EECS, MIT zhixu@mit.edu EECS, MIT yuzhe@mit.edu
Pseudocode	Yes	We provide a narrative overview of the algorithm; the pseudo-code can be found in Appendix A.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets	No	The paper mentions using 'several stochastic control tasks' and that they 'first discretize the spaces into very fine grid and run standard value iteration to obtain a proxy of Q'. However, it does not provide concrete access information (links, DOIs, formal citations) to publicly available datasets used for training.
Dataset Splits	No	The paper does not explicitly provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The detailed setup can be found in Appendix H. In short, we first discretize the spaces into very fine grid and run standard value iteration to obtain a proxy of Q. The proxy has a very small approximate rank in all tasks; we hence use r = 10 for our experiments. As mentioned, we simply select r states and r actions that are far from each other in their respective metric.