reproducibilityindex.ai

Multi-User Reinforcement Learning with Low Rank Rewards

Authors: Dheeraj Mysore Nagaraj, Suhas S Kowshik, Naman Agarwal, Praneeth Netrapalli, Prateek Jain

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We introduce the setting of multi-user collaborative reinforcement learning in the case of tabular and linear MDPs. In our study, we isolate and overcome several technical and conceptual challenges in order achieve sample efficient learning. The main technical challenge we encounter is obtaining the right distribution of state-action pairs from users such that we can successfully run low-rank matrix estimation algorithms, without access to a generative model (i.e, we can only deploy policies and query trajectories corresponding to this policy). This requires clever algorithm design since some states can be hard to even reach. In fact, this endeavor goes beyond standard RL methods and is related to functional reward maximization and mean field limits of multi-agent RL. ... We leave the computational aspects to future work.
Researcher Affiliation	Industry	1Google Research, Bangalore 2Amazon India 3Work was done prior to joining Amazon 4Google Research, Princeton. Correspondence to: Dheeraj Nagaraj <dheerajnagaraj@google.com>.
Pseudocode	Yes	Algorithm 1 Uniform Mask Sampler for Tabular MDPs ... Algorithm 2 Well Conditioned Matrix Sampler
Open Source Code	No	The paper does not provide any links to open-source code or statements about its availability.
Open Datasets	No	The paper focuses on theoretical algorithms for MDPs and does not mention specific datasets or their public availability.
Dataset Splits	No	The paper is theoretical and does not conduct experiments, therefore no dataset split information (training/validation/test) is provided.
Hardware Specification	No	The paper is theoretical and does not describe experimental procedures, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on algorithms, not their implementation. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations.