Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
Authors: Kefan Dong, Tengyu Ma
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we design the ๏ฌrst asymptotic instance-optimal algorithm for general interactive decision making problems with ๏ฌnite number of decisions under mild conditions. Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits (Lai et al., 1985) and prior works on linear bandits (Lattimore & Szepesvari, 2017), and improve upon the previous best instance-dependent upper bound (Xu et al., 2021) for reinforcement learning. |
| Researcher Affiliation | Academia | Kefan Dong Stanford University EMAIL Tengyu Ma Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Test-to-Commit (T2C) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include a link to a code repository. |
| Open Datasets | No | The paper does not conduct empirical studies or use datasets for training. It discusses problem types like multi-armed bandits and tabular RL in a theoretical context. |
| Dataset Splits | No | The paper does not conduct empirical studies or use datasets for validation. It discusses theoretical bounds and algorithm design. |
| Hardware Specification | No | The paper does not report empirical experiments, and therefore no specific hardware details are provided. |
| Software Dependencies | No | The paper does not report empirical experiments, and therefore no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper focuses on theoretical algorithm design and analysis. It does not include specific experimental setup details such as hyperparameters or training configurations for empirical evaluation. |