Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimal Decision Tree and Adaptive Submodular Ranking with Noisy Outcomes
Authors: Su Jia, Fatemeh Navidi, Viswanath Nagarajan, R. Ravi
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers and observed that our algorithms have costs very close to the information-theoretic minimum.1 Keywords: approximation algorithms, active learning, optimal decision tree, submodular functions, stochastic set cover. 4. Comprehensive Numerical Experiments. We tested our algorithms on both a synthetic and a real data set arising from toxic chemical identification. We compared the empirical performance guarantee of our algorithms to an information-theoretic lower bound. The cost of the solution returned by our non-adaptive algorithm is typically within 50% of this lower bound, and typically within 20% for the adaptive algorithm, demonstrating the effective practical performance of our algorithms. |
| Researcher Affiliation | Collaboration | Su Jia EMAIL Center of Data Science for Enterprise and Society Cornell University. Fatemeh Navidi EMAIL Midpoint Markets. Viswanath Nagarajan EMAIL Industrial & Operations Engineering University of Michigan. R. Ravi EMAIL Tepper School of Business Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Non-adaptive SFRN algorithm. Algorithm 2 Algorithm for ASR instance J , based on Navidi et al. (2020). Algorithm 3 Main algorithm for large number of noisy outcomes. Algorithm 4 Modified algorithm for ASR instance J . |
| Open Source Code | Yes | The implementations of the adaptive and non-adaptive algorithms are available online. . https://github.com/Fatemeh Navidi/ODTN ; https://github.com/sjia1/ODT-with-noisy-outcomes |
| Open Datasets | Yes | We considered a data set called WISER , which includes 414 chemicals (hypothesis) and 78 binary tests. . https://wiser.nlm.nih.gov |
| Dataset Splits | No | The paper evaluates the 'cost' (expected number of tests) of algorithms against an information-theoretic lower bound on various datasets (WISER, CL-d). However, it does not describe any partitioning of these datasets into training, validation, or test sets for model development or evaluation, as the primary objective is to evaluate decision-making costs rather than predictive model performance. Details such as percentages or specific counts for dataset splits are not provided. |
| Hardware Specification | No | The paper describes 'Comprehensive Numerical Experiments' and presents results in tables (Table 2, 3, 4, 6), but it does not specify any hardware details such as GPU/CPU models, memory, or cloud computing resources used to conduct these experiments. |
| Software Dependencies | No | The paper mentions that the algorithms were implemented and provides GitHub links, but it does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | No | The paper evaluates the 'cost' of various algorithms on different datasets. While it describes the generation of these datasets (e.g., 'Random Binary Classifiers with Margin Error', 'Distributions'), it does not provide details on experimental setup parameters such as hyperparameters, learning rates, batch sizes, optimizers, or specific training configurations, as the work focuses on approximation algorithms for decision problems rather than training predictive models. |