reproducibilityindex.ai

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

Authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A Numerical Experiments All experiments in this section were performed on HP Elite Book 830 G8 with an Intel i7 core and 16 GB of RAM. Each experiment s runtime for individual realizations took at most 2-3 hours, and reproducing all results is feasible within a day.
Researcher Affiliation	Academia	Stefan Stojanovic KTH, Stockholm, Sweden stesto@kth.se Yassir Jedra MIT, Cambridge, USA jedra@mit.edu Alexandre Proutiere KTH, Digital Futures, Stockholm, Sweden alepro@kth.se
Pseudocode	Yes	Algorithm 1: Low-Rank Policy Iteration (Lo Ra-PI)
Open Source Code	Yes	Please refer to Appendix A and provided code in the supplementary material.
Open Datasets	No	The paper mentions using “synthetically generated low-rank MDPs” for numerical experiments (Appendix A). It does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, and testing. It describes using synthetically generated MDPs without explicit split details.
Hardware Specification	Yes	All experiments in this section were performed on HP Elite Book 830 G8 with an Intel i7 core and 16 GB of RAM.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their versions).
Experiment Setup	Yes	We considered an MDP with S = A = 2, γ = 0.87, a reward matrix given by... We initialized VI with V (0) = [2.86 2.98] . For Lo Ra-VI: S = A = 1000, γ = 0.1. We used K = 10 anchors, V (0) = 0, rewards are noisy with Gaussian noise σ = 0.01.