Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DAGs with NO TEARS: Continuous Optimization for Structure Learning

Authors: Xun Zheng, Bryon Aragam, Pradeep K. Ravikumar, Eric P. Xing

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared our method against greedy equivalent search (GES) [9, 31], the PC algorithm [42], and Li NGAM [38]. ... In each experiment, a random graph G was generated from one of two random graph models, Erdös Rényi (ER) or scale-free (SF). ... We now examine our method for structure recovery, which is shown in Figure 3. ... In our experiments we generated random graphs with d = 10, and then generated 10 simulated datasets containing n = 20 samples (for high-dimensions) and n = 1000 (for low-dimensions). We then compared the scores returned by our method to the exact global minimizer computed by GOBNILP along with the estimated parameters. The results are shown in Table 1.
Researcher Affiliation	Collaboration	Xun Zheng1, Bryon Aragam1, Pradeep Ravikumar1, Eric P. Xing1,2 1Carnegie Mellon University 2Petuum Inc. EMAIL
Pseudocode	Yes	Algorithm 1 NOTEARS algorithm
Open Source Code	Yes	The implementation is publicly available at https://github.com/xunzheng/ notears.
Open Datasets	Yes	In each experiment, a random graph G was generated from one of two random graph models, Erdös Rényi (ER) or scale-free (SF). ... We also compared FGS and NOTEARS on a real dataset provided by Sachs et al. [33].
Dataset Splits	No	The paper does not explicitly state the use of validation splits or a specific validation methodology like cross-validation. It mentions training models and evaluating on test sets but lacks details on validation splits.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions L-BFGS, PQN method, and GOBNILP but does not specify version numbers for these or any other software components.
Experiment Setup	Yes	For brevity, we outline the basic set-up of our experiments here; precise details of our experimental set-up, including all parameter choices and more detailed evaluations, can be found in Appendix E.