Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Authors: Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on several stochastic control tasks confirm the efficacy of our low-rank algorithms.
Researcher Affiliation Academia Devavrat Shah EECS, MIT devavrat@mit.edu Dogyoon Song EECS, MIT dgsong@mit.edu Zhi Xu EECS, MIT zhixu@mit.edu EECS, MIT yuzhe@mit.edu
Pseudocode Yes We provide a narrative overview of the algorithm; the pseudo-code can be found in Appendix A.
Open Source Code No The paper does not provide a direct link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets No The paper mentions using 'several stochastic control tasks' and that they 'first discretize the spaces into very fine grid and run standard value iteration to obtain a proxy of Q'. However, it does not provide concrete access information (links, DOIs, formal citations) to publicly available datasets used for training.
Dataset Splits No The paper does not explicitly provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The detailed setup can be found in Appendix H. In short, we first discretize the spaces into very fine grid and run standard value iteration to obtain a proxy of Q. The proxy has a very small approximate rank in all tasks; we hence use r = 10 for our experiments. As mentioned, we simply select r states and r actions that are far from each other in their respective metric.