Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentiable Structure Learning and Causal Discovery for General Binary Data

Authors: Chang Deng, Bryon Aragam

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that our approach effectively captures complex relationships in discrete data. 6 Experiments We solve (13) using either DAGMA [5] or NOTEARS-MLP [55]... Our method denoted BINOTEARS is compared against several baselines, including DAGMA [5], PC [41], and FGES [37]. Our primary empirical results appear in Figures 1 and 2. We evaluate accuracy using structural Hamming distance (SHD)... In Figure 1 (a), we simulate data X according to the SEM in (7)... In Figure 2, we consider a larger DAG... We further evaluate NOTEARS (linear), NOTEARS-MLP, and BINOTEARS (ours) on the realworld dataset of Sachs et al. [39].
Researcher Affiliation	Academia	Chang Deng Bryon Aragam Booth School of Business, University of Chicago, Chicago, IL 60637 EMAIL
Pseudocode	Yes	C.4 Procedure for recovering causal graph and parameters To formalize the recovery procedure from Section 4.2, we present Algorithms 1 and 2. Algorithm 1: RECOVERPARENTS(p, π, j) Algorithm 2: RECOVERDAG(p,π) Algorithm 3: RECOVERSPARSESTDAG(p)
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be available if the paper get accepted.
Open Datasets	Yes	We further evaluate NOTEARS (linear), NOTEARS-MLP, and BINOTEARS (ours) on the realworld dataset of Sachs et al. [39]. The data contain n = 7,466 measurements of d = 11 proteins and phospholipids in human immune cells and are accompanied by a widely used consensus network that serves as a gold standard.
Dataset Splits	No	Simulation We generate random dataset X Rn p by sampling i.i.d from the models described above. For each simulation, we produce datasets with n samples cross graphs with p nodes. (Small Graph) p = {5, 6, 7, 8, 9}, k = {1, 2}, n = 10000 and graph types: {ER,SF} (Large Graph) p = {10, 20, 30, 40}, k = {1, 2, 4}, n = 1000 and graph types: {ER,SF}
Hardware Specification	Yes	D.2 Implementation Equipment The experiments are conducted in the following CPU architectures Intel Broadwell 28 cores @ 2.4 GHz with 64 GB memory per node Intel Skylake 40 cores @ 2.4 GHz with 96 GB memory per node
Software Dependencies	No	Fast Greedy Equivalence Search (FGES [37]) is based on greedy search and assumes linear dependency between variables. The implementation is based on the py-tetrad package, available at https://github.com/cmu-phil/py-tetrad. We use search.use_bdeu(sample_prior=10, structure_prior=0). PC [42] is constraint-based method and based on uses conditional independence induced by causal relationships to learn those causal relationships. The implementation is based on the py-tetrad package, available at https://github.com/cmu-phil/py-tetrad. We use search.use_chi_square(alpha=0.1) NOTEARS-MLP [55] is a continuous DAG-learning method... Its Python implementation is available at https://github.com/xunzheng/notears. DAGMA [5] is a continuous DAG-learning algorithm... Its implementation can be found at https://github.com/kevinsbello/dagma.
Experiment Setup	Yes	Hyperparameter tuning Theorem 3 indicates that one should ideally choose small values of λ and δ for the quasi-MCP penalty... Empirically, γ = 0.5, λ = 0.05, and δ = 0.2 perform well in our experiments.