Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Decision Tree and Adaptive Submodular Ranking with Noisy Outcomes

Authors: Su Jia, Fatemeh Navidi, Viswanath Nagarajan, R. Ravi

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classiﬁers and observed that our algorithms have costs very close to the information-theoretic minimum.1 Keywords: approximation algorithms, active learning, optimal decision tree, submodular functions, stochastic set cover. 4. Comprehensive Numerical Experiments. We tested our algorithms on both a synthetic and a real data set arising from toxic chemical identiﬁcation. We compared the empirical performance guarantee of our algorithms to an information-theoretic lower bound. The cost of the solution returned by our non-adaptive algorithm is typically within 50% of this lower bound, and typically within 20% for the adaptive algorithm, demonstrating the eﬀective practical performance of our algorithms.
Researcher Affiliation	Collaboration	Su Jia EMAIL Center of Data Science for Enterprise and Society Cornell University. Fatemeh Navidi EMAIL Midpoint Markets. Viswanath Nagarajan EMAIL Industrial & Operations Engineering University of Michigan. R. Ravi EMAIL Tepper School of Business Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 Non-adaptive SFRN algorithm. Algorithm 2 Algorithm for ASR instance J , based on Navidi et al. (2020). Algorithm 3 Main algorithm for large number of noisy outcomes. Algorithm 4 Modiﬁed algorithm for ASR instance J .
Open Source Code	Yes	The implementations of the adaptive and non-adaptive algorithms are available online. . https://github.com/Fatemeh Navidi/ODTN ; https://github.com/sjia1/ODT-with-noisy-outcomes
Open Datasets	Yes	We considered a data set called WISER , which includes 414 chemicals (hypothesis) and 78 binary tests. . https://wiser.nlm.nih.gov
Dataset Splits	No	The paper evaluates the 'cost' (expected number of tests) of algorithms against an information-theoretic lower bound on various datasets (WISER, CL-d). However, it does not describe any partitioning of these datasets into training, validation, or test sets for model development or evaluation, as the primary objective is to evaluate decision-making costs rather than predictive model performance. Details such as percentages or specific counts for dataset splits are not provided.
Hardware Specification	No	The paper describes 'Comprehensive Numerical Experiments' and presents results in tables (Table 2, 3, 4, 6), but it does not specify any hardware details such as GPU/CPU models, memory, or cloud computing resources used to conduct these experiments.
Software Dependencies	No	The paper mentions that the algorithms were implemented and provides GitHub links, but it does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	No	The paper evaluates the 'cost' of various algorithms on different datasets. While it describes the generation of these datasets (e.g., 'Random Binary Classiﬁers with Margin Error', 'Distributions'), it does not provide details on experimental setup parameters such as hyperparameters, learning rates, batch sizes, optimizers, or specific training configurations, as the work focuses on approximation algorithms for decision problems rather than training predictive models.