Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

One-Round Active Learning through Data Utility Learning and Proxy Models

Authors: Jiachen T. Wang, Si Chen, Ruoxi Jia

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first evaluate the two building blocks of DULO: (1) the effectiveness of using data utility model to predict the performance of a model trained on a given dataset, and (2) the effectiveness of the two heuristics proposed in Section 4 for measuring the transferability between learning algorithms. Finally, we evaluate the performance of DULO and compare it with the existing one-round AL strategies and the state-of-the-art batch AL strategies in one-round setting.
Researcher Affiliation	Academia	Jiachen T. Wang EMAIL Princeton University Si Chen EMAIL Virginia Tech Ruoxi Jia EMAIL Virginia Tech
Pseudocode	Yes	A Pseudocode of the Proposed One-Round AL Algorithm Algorithm 1: DULO for One-Round AL
Open Source Code	No	The paper states: "We test these baselines with open-source implementation4." This refers to the baselines used for comparison, not the authors' own code for DULO. There is no explicit statement or link in the paper providing concrete access to the source code for the methodology described in this paper.
Open Datasets	Yes	Datasets & Proxy Models & Implementation. We summarize the dataset and proxy/target model settings in Table 1. MNIST. MNIST consists of 70,000 handwritten digits. The images are 28 28 grayscale pixels. CIFAR10. CIFAR10 consists of 60,000 3-channel images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck). Each image is of size 32 32. USPS. USPS is a real-life dataset of 9,298 handwritten digits. Pub Fig83. Pub Fig83 is a real-life dataset of 13,837 facial images for 83 individuals. CIFAR100. CIFAR100 is similar to CIFAR10 except it has 100 classes. IMDB. The IMDb dataset consists of 50,000 movie reviews, each marked as being a positive or negative review.
Dataset Splits	Yes	The initial labeled data set L = {(xi, f (xi))}N i=1 is randomly partitioned into a training set Ltr and a validation set Lval. For all datasets, we randomly sample 4000 subsets of Ltr and use the corresponding proxy model to generate the training data for utility learning. We artificially generate class-imbalance for MNIST unlabeled dataset by sampling 55% of the instances from one class and the rest of the instances uniformly across the remaining 9 classes. For CIFAR10 s unlabeled dataset, we sample 50% of the instances from two classes, 25% instances from another two classes, and 25% instances from the remaining 6 classes. For MNIST, we add each of the noise scales of 0.25, 0.6, and 1.0 to 25% unlabeled data.
Hardware Specification	Yes	All of our experiments are performed on Tesla K80 GPU. Table 7: Average Runtime (in seconds) for DULO and different baselines in selecting a fixed amount of data points on MNIST and CIFAR10 dataset, with the setting described in Section 5.3.1. The clock time is recorded when running on an NVIDIA A100 80GB GPU and an AMD 64-Core CPU Processor.
Software Dependencies	No	The paper mentions using "scikit-learn" and models like "LSTM model follows from Py Torch tutorial". However, specific version numbers for these software components (e.g., scikit-learn version, PyTorch version) are not provided in the text.
Experiment Setup	Yes	We set stochastic greedy optimization s precision parameter ε = 10 3 and optimization block size B = 2000 in all experiments. We use Adam optimizer with learning rate 10 3, mini-batch size 32 to train all of the aforementioned models for 30 epochs, except that we train Le Net for 5 epochs when using it for testing AL performance on MNIST. We use the Adam optimizer with learning rate 10 5, mini-batch size of 32, β1 = 0.9, and β2 = 0.999 to train all of the Deep Sets-based utility models.