Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exemplar Guided Active Learning

Authors: Jason S. Hartford, Kevin Leyton-Brown, Hadas Raviv, Dan Padnos, Shahar Lev, Barak Lenz

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can signiﬁcantly reduce the number of samples needed to reach target accuracy levels. Our experiments are designed to test whether automated search with embeddings could ﬁnd examples of very rare classes and to assess the effect of different skew ratios on performance.
Researcher Affiliation	Industry	AI21 Labs EMAIL AI21 Labs EMAIL AI21 Labs EMAIL AI21 Labs EMAIL AI21 Labs EMAIL AI21 Labs EMAIL
Pseudocode	Yes	Algorithm 1: EGAL: Exemplar Guided Active Learning
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	Beyond these key contributions, we also present a new Reddit word sense disambiguation dataset, which is designed to evaluate active learning methods for highly skewed label distributions. To address this, we collected a new dataset for evaluating active learning methods for word sense disambiguation.
Dataset Splits	No	The paper mentions a 'test set' but does not explicitly define a 'validation' split with percentages or counts.
Hardware Specification	No	The paper mentions using BERT embeddings and Huggingface's Transformer library but does not specify any hardware details like GPU/CPU models or memory used for experiments.
Software Dependencies	No	All experiments used Scikit Learn (Pedregosa et al., 2011) s multi-class logistic regression classiﬁer... We used Huggingface s Transformer library (Wolf et al., 2019).
Experiment Setup	No	All experiments used Scikit Learn (Pedregosa et al., 2011) s multi-class logistic regression classiﬁer with default regularization parameters.