Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality
Authors: Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an optimization with constraints perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal deployment complexity, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. |
| Researcher Affiliation | Collaboration | Jiawei Huang :, Jinglin Chen:, Li Zhao;, Tao Qin;, Nan Jiang:, Tie-Yan Liu; : Department of Computer Science, University of Illinois at Urbana-Champaign {jiaweih, jinglinc, nanjiang}@illinois.edu ; Microsoft Research Asia {lizo, taoqin, tyliu}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Layer-by-Layer Batch Exploration Strategy for Linear MDPs Given Reward Function... Algorithm 2: Deployment-Efficient RL with Covariance Matrix Estimation |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical formulations, lower bounds, and algorithms for linear MDPs, which are theoretical models. It does not describe experiments using empirical datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not present empirical experiments that would require dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require specific hardware for execution. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs. It does not specify any software dependencies with version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and presents algorithms (Algorithm 1 and 2) with general parameters (e.g., 'β'), but it does not describe a specific experimental setup with concrete hyperparameter values or training configurations for empirical runs. |