Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Near Optimal Non-asymptotic Sample Complexity of 1-Identification

Authors: Zitian Li, Wang Chi Cheung

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design a new algorithm Sequential-Exploration-Exploitation (SEE), and conduct theoretical analysis from the non-asymptotic perspective. Novel to the literature, we achieve near optimality, in the sense of matching upper and lower bounds on the pulling complexity. The numerical result also indicates the effectiveness of our algorithm, compared to existing benchmarks. We conduct numeric experiments to compare the performance of 1-identification algorithms. The numeric results suggest the excellency of our proposed SEE algorithm and also the weakness of some benchmark algorithms.
Researcher Affiliation	Academia	1Department of Industrial Systems Engineering & Management, National University of Singapore, Singapore. Correspondence to: Zitian Li <EMAIL>, Wang Chi Cheung <EMAIL>.
Pseudocode	Yes	Algorithm 1 SEE(Informal) ... Algorithm 2 Sequential-Explore-Exploit(SEE) ... Algorithm 3 SEE-Exploration ... Algorithm 4 SEE-Exploitation ... Algorithm 5 Sampling Rule
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	We conduct numerical evaluations on SEE and existing benchmark algorithms on synthetic instances. We consider six different groups of reward vectors, which are All Worse , Unique Qualified , One Quarter , Half Good , All Good and Linear . The main difference among these groups is the number of qualified arms. All instances in All Worse group are negative, whose mean rewards of all arms are below µ0. Other groups of instances are positive, meaning that there is at least one arm whose mean reward is > µ0.
Dataset Splits	No	We conduct numerical evaluations on SEE and existing benchmark algorithms on synthetic instances... In each experiment setting, we run 1000 independent trials and compute the empirical average of the stopping times.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments. It only discusses the experimental settings and outcomes without mentioning any specific hardware specifications.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for the tools and libraries used in the implementation.
Experiment Setup	Yes	The parameter setting of SEE is δk = 1 3k , βk = 2k/4, αk = 5k, C = 1.01. In all numerical experiments, all the algorithms in section 6 achieve 100% accuracy in identifying a correct answer... We took arm number K = 10, 20, 30, 40, 50, 100, 150, 200 and the tolerance level δ = 0.001, 0.0001. Fix µ0 = 0.5, = 0.15... We set 108 as our forced stopping threshold for each group of experiments... When implementing the algorithm lil HDo C, its originally proposed length of warm-up stage is larger than the total pulling times of some of the benchmark algorithms. To allow comparisons, in our numerical experiment, we only uniformly pull all the arms 200 times for the algorithm lil HDo C.