Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design

Authors: Jayanth Jagalur-Mohan, Youssef Marzouk

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our theoretical ﬁndings on synthetic problems and on a real-world climate monitoring example.
Researcher Affiliation	Academia	Jayanth Jagalur-Mohan EMAIL Youssef Marzouk EMAIL Massachusetts Institute of Technology Cambridge, MA 02139 USA
Pseudocode	Yes	Algorithm 1 Standard batch greedy algorithm Algorithm 2 Distributed batch greedy algorithm Algorithm 3 Stochastic batch greedy algorithm Algorithm 4 Greedy algorithm using modular lower bounds Algorithm 5 Sequential greedy algorithm for minimizing information loss
Open Source Code	No	The paper mentions: "The code for s ELM is publicly available, and more details about the E3SM land models can be found in the works by Lu and Ricciuto (2019); Ricciuto et al. (2018)." This refers to a third-party model used in their experiments, not the source code for the methodology presented in this paper.
Open Datasets	No	The paper uses "synthetic problems" and a "real-world climate monitoring example." For the latter, it states: "Drawing realizations of these parameters yields a simulation ensemble with 2000 samples." This indicates the authors generated data from a model (sELM) rather than using a pre-existing, publicly available dataset that is formally linked or cited for their experiments.
Dataset Splits	No	The paper describes generating "1000 random instances of the forward operator G" for synthetic problems and using a "simulation ensemble with 2000 samples" from a climate model. However, it does not specify any training, validation, or test dataset splits in the conventional machine learning sense for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or cloud resources) used for conducting the experiments.
Software Dependencies	No	The paper mentions the "simplified E3SM land model (s ELM)" and states "The code for s ELM is publicly available." However, it does not provide a specific version number for sELM itself, nor does it list any other software libraries or tools with their respective version numbers that were used in the experimental setup.
Experiment Setup	Yes	In Section 5.1, the paper states: "The dimension of the parameters X is set to n = 20, while cardinality of the candidate set of observations Y is ﬁxed at m = 100." It also specifies "correlation lengths 0.105 and 0.021" for the prior and observation error covariances, and that "We draw 1000 random instances of the forward operator G." For algorithm parameters, it mentions considering "seven diﬀerent batch sizes, corresponding to q {1%, 10%, 20%, 30%, 40%, 50%, 100%}."