Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Authors: Kefan Dong, Tengyu Ma

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we design the ๏ฌrst asymptotic instance-optimal algorithm for general interactive decision making problems with ๏ฌnite number of decisions under mild conditions. Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits (Lai et al., 1985) and prior works on linear bandits (Lattimore & Szepesvari, 2017), and improve upon the previous best instance-dependent upper bound (Xu et al., 2021) for reinforcement learning.
Researcher Affiliation Academia Kefan Dong Stanford University EMAIL Tengyu Ma Stanford University EMAIL
Pseudocode Yes Algorithm 1 Test-to-Commit (T2C)
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include a link to a code repository.
Open Datasets No The paper does not conduct empirical studies or use datasets for training. It discusses problem types like multi-armed bandits and tabular RL in a theoretical context.
Dataset Splits No The paper does not conduct empirical studies or use datasets for validation. It discusses theoretical bounds and algorithm design.
Hardware Specification No The paper does not report empirical experiments, and therefore no specific hardware details are provided.
Software Dependencies No The paper does not report empirical experiments, and therefore no specific software dependencies with version numbers are listed.
Experiment Setup No The paper focuses on theoretical algorithm design and analysis. It does not include specific experimental setup details such as hyperparameters or training configurations for empirical evaluation.