Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality
Authors: Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an optimization with constraints perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal deployment complexity, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. |
| Researcher Affiliation | Collaboration | Jiawei Huang :, Jinglin Chen:, Li Zhao;, Tao Qin;, Nan Jiang:, Tie-Yan Liu; : Department of Computer Science, University of Illinois at Urbana-Champaign EMAIL ; Microsoft Research Asia EMAIL |
| Pseudocode | Yes | Algorithm 1: Layer-by-Layer Batch Exploration Strategy for Linear MDPs Given Reward Function... Algorithm 2: Deployment-Efficient RL with Covariance Matrix Estimation |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical formulations, lower bounds, and algorithms for linear MDPs, which are theoretical models. It does not describe experiments using empirical datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not present empirical experiments that would require dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require specific hardware for execution. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs. It does not specify any software dependencies with version numbers for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and presents algorithms (Algorithm 1 and 2) with general parameters (e.g., 'β'), but it does not describe a specific experimental setup with concrete hyperparameter values or training configurations for empirical runs. |