Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration

Authors: Suorong Yang, Peng Ye, Furao Shen, Dongzhan Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method outperforms existing state-of-the-art approaches on various benchmark datasets and architectures, e.g., reducing 50% training costs on Image Net-1k with lossless performance.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University 2Shanghai Artificial Intelligence Laboratory 3The Chinese University of Hong Kong. Correspondence to: Furao Shen <EMAIL>, Dongzhan Zhou <EMAIL>.
Pseudocode	No	The paper describes the method using prose and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of their source code or a link to a code repository.
Open Datasets	Yes	In line with previous works (Tan et al., 2024; Xia et al., 2023b; Qin et al., 2024), we evaluate the effectiveness of our proposed method using widely adopted benchmark datasets, including CIFAR-10/100 (Krizhevsky et al., 2009), Tiny Image Net (Chrabaszcz et al., 2017), and Image Net1k (Deng et 2 al., 2009). In addition, we evaluate the robustness of our method in noisy datasets. To further assess the generalization ability of our method, we extend the evaluation to more challenging datasets, such as Image Net A/O (Hendrycks et al., 2021b), Image Net-Hard (Taesiri et al., 2024), and Image Net-R (Hendrycks et al., 2021a).
Dataset Splits	Yes	In line with previous works (Tan et al., 2024; Xia et al., 2023b; Qin et al., 2024), we evaluate the effectiveness of our proposed method using widely adopted benchmark datasets, including CIFAR-10/100 (Krizhevsky et al., 2009), Tiny Image Net (Chrabaszcz et al., 2017), and Image Net1k (Deng et 2 al., 2009). ... Following standard evaluation settings, we report the area under the precision-recall curve (AUPR) for Image Net-O and classification accuracy for the other datasets.
Hardware Specification	Yes	Table 2. Results on Image Net-1k with a 60% selection ratio using Res Net-50 on an 8-A100 server. ... Table 6. Experiment results on more advanced architectures, including Vi T-B, Vi T-L, and Swin-T on Image Net-1k with a 4-A100 GPU server. ... Table 7. Overheads of fine-tuning and feature embedding before model training on large-scale datasets with a 1-V100 GPU server.
Software Dependencies	No	Specifically, we use the One Cycle scheduler with the SGD/LARS optimizer for model training, a momentum of 0.9, a weight decay of 5e-4, and cosine annealing. We employ Trivial Augment (M uller & Hutter, 2021) in our framework.
Experiment Setup	Yes	Specifically, we use the One Cycle scheduler with the SGD/LARS optimizer for model training, a momentum of 0.9, a weight decay of 5e-4, and cosine annealing. We employ Trivial Augment (M uller & Hutter, 2021) in our framework. ... Moreover, we use Info NCE loss to fine-tune adapters for 15 epochs on all datasets.