reproducibilityindex.ai

Efficient Identification of Approximate Best Configuration of Training in Large Datasets

Authors: Silu Huang, Chi Wang, Bolin Ding, Surajit Chaudhuri3862-3869

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments with large datasets. We demonstrate that our ABC solution is tens to hundreds of times faster, while returning top conﬁgurations with no more than 1% accuracy loss.
Researcher Affiliation	Collaboration	1University of Illinois, Urbana-Champaign, IL 2Microsoft Research, Redmond, WA 3Alibaba Group, Bellevue, WA shuang86@illinois.edu, {wang.chi, surajitc}@microsoft.com, bolin.ding@alibaba-inc.com
Pseudocode	Yes	Algorithm 1: ABC
Open Source Code	No	The paper does not provide a direct link to open-source code for its methodology or an explicit statement of code release.
Open Datasets	Yes	We evaluate with ﬁve large-scale machine learning benchmarks that are publicly available.
Dataset Splits	No	The paper states:
Hardware Specification	Yes	We conducted our evaluation on a VM with 8 cores and 56 GB RAM.
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	The initial training sample size and testing sample size are 1000 and 2000 respectively. The geometry step size is set to be c = 2. ϵ = 0.01, δ = 0.5.