Multi-User Reinforcement Learning with Low Rank Rewards

Authors: Dheeraj Mysore Nagaraj, Suhas S Kowshik, Naman Agarwal, Praneeth Netrapalli, Prateek Jain

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We introduce the setting of multi-user collaborative reinforcement learning in the case of tabular and linear MDPs. In our study, we isolate and overcome several technical and conceptual challenges in order achieve sample efficient learning. The main technical challenge we encounter is obtaining the right distribution of state-action pairs from users such that we can successfully run low-rank matrix estimation algorithms, without access to a generative model (i.e, we can only deploy policies and query trajectories corresponding to this policy). This requires clever algorithm design since some states can be hard to even reach. In fact, this endeavor goes beyond standard RL methods and is related to functional reward maximization and mean field limits of multi-agent RL. ... We leave the computational aspects to future work.
Researcher Affiliation Industry 1Google Research, Bangalore 2Amazon India 3Work was done prior to joining Amazon 4Google Research, Princeton. Correspondence to: Dheeraj Nagaraj <dheerajnagaraj@google.com>.
Pseudocode Yes Algorithm 1 Uniform Mask Sampler for Tabular MDPs ... Algorithm 2 Well Conditioned Matrix Sampler
Open Source Code No The paper does not provide any links to open-source code or statements about its availability.
Open Datasets No The paper focuses on theoretical algorithms for MDPs and does not mention specific datasets or their public availability.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore no dataset split information (training/validation/test) is provided.
Hardware Specification No The paper is theoretical and does not describe experimental procedures, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithms, not their implementation. Therefore, no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations.