Multi-User Reinforcement Learning with Low Rank Rewards
Authors: Dheeraj Mysore Nagaraj, Suhas S Kowshik, Naman Agarwal, Praneeth Netrapalli, Prateek Jain
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce the setting of multi-user collaborative reinforcement learning in the case of tabular and linear MDPs. In our study, we isolate and overcome several technical and conceptual challenges in order achieve sample efficient learning. The main technical challenge we encounter is obtaining the right distribution of state-action pairs from users such that we can successfully run low-rank matrix estimation algorithms, without access to a generative model (i.e, we can only deploy policies and query trajectories corresponding to this policy). This requires clever algorithm design since some states can be hard to even reach. In fact, this endeavor goes beyond standard RL methods and is related to functional reward maximization and mean field limits of multi-agent RL. ... We leave the computational aspects to future work. |
| Researcher Affiliation | Industry | 1Google Research, Bangalore 2Amazon India 3Work was done prior to joining Amazon 4Google Research, Princeton. Correspondence to: Dheeraj Nagaraj <dheerajnagaraj@google.com>. |
| Pseudocode | Yes | Algorithm 1 Uniform Mask Sampler for Tabular MDPs ... Algorithm 2 Well Conditioned Matrix Sampler |
| Open Source Code | No | The paper does not provide any links to open-source code or statements about its availability. |
| Open Datasets | No | The paper focuses on theoretical algorithms for MDPs and does not mention specific datasets or their public availability. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments, therefore no dataset split information (training/validation/test) is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental procedures, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithms, not their implementation. Therefore, no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not detail an experimental setup, hyperparameters, or training configurations. |