Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Journal of Machine Learning Research (JMLR) - 2018

Documentation Rate of Empirical Papers by Reproducibility Variable

Distribution of Empirical Papers by Number of Documented Variables

Website:

Venue	Year	Papers	Reproducibility Score Reproducibility Score based on Gundersen et al. (2025). See Methods for details.	Documentation Score Documentation Score is the average score over the seven reproducibility variables for empirical research papers. See Methods for details.	% Empirical Percentage of papers that are empirical research vs theoretical research.	% Industry Percentage of empirical research papers with at least one author from Industry.	Website
JMLR	2018	84	0.36	3.35	89.29%	21.33%

Search Papers

	Pseudocode	Open Source Code	Open Datasets	Dataset Splits	Hardware Specification	Software Dependencies	Experiment Setup
A Constructive Approach to $L_0$ Penalized Regression	✅	✅	❌	❌	❌	❌	✅	3
A Direct Approach for Sparse Quadratic Discriminant Analysis	✅	❌	✅	✅	❌	❌	✅	4
A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference	✅	❌	❌	✅	✅	❌	✅	4
A New and Flexible Approach to the Analysis of Paired Comparison Data	✅	❌	✅	❌	❌	❌	✅	3
A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation	✅	✅	❌	❌	✅	✅	✅	5
A Random Matrix Analysis and Improvement of Semi-Supervised Learning for Large Dimensional Data	✅	❌	✅	✅	❌	❌	✅	4
A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization	❌	❌	❌	✅	❌	❌	✅	2
A Spectral Approach for the Design of Experiments: Design, Analysis and Algorithms	✅	❌	❌	✅	❌	❌	✅	3
A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations	❌	❌	✅	✅	✅	❌	✅	4
Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization	✅	✅	✅	✅	✅	✅	✅	7
An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach	✅	❌	✅	❌	❌	❌	✅	3
An efficient distributed learning algorithm based on effective local functional approximations	✅	❌	✅	❌	✅	❌	✅	4
Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection	✅	❌	✅	❌	❌	❌	❌	2
Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models	❌	❌	❌	❌	❌	❌	✅	1
Change-Point Computation for Large Graphical Models: A Scalable Algorithm for Gaussian Graphical Models with Change-Points	✅	✅	❌	❌	❌	✅	✅	4
Clustering is semidefinitely not that hard: Nonnegative SDP for manifold disentangling	✅	✅	✅	❌	✅	❌	✅	5
Connections with Robust PCA and the Role of Emergent Sparsity in Variational Autoencoder Models	❌	❌	✅	❌	❌	❌	✅	2
Covariances, Robustness, and Variational Bayes	❌	✅	✅	❌	❌	✅	✅	4
DALEX: Explainers for Complex Predictive Models in R	❌	✅	❌	❌	❌	✅	❌	2
Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations	❌	✅	✅	✅	❌	❌	✅	4
Design and Analysis of the NIPS 2016 Review Process	❌	❌	❌	✅	❌	❌	✅	2
Distributed Proximal Gradient Algorithm for Partially Asynchronous Computer Clusters	✅	❌	❌	❌	✅	❌	✅	3
Distribution-Specific Hardness of Learning Neural Networks	❌	❌	❌	❌	❌	❌	❌	0
Dual Principal Component Pursuit	✅	❌	✅	❌	✅	✅	✅	5
ELFI: Engine for Likelihood-Free Inference	✅	✅	❌	❌	❌	❌	❌	2
Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes	✅	❌	✅	✅	❌	❌	✅	4
Emergence of Invariance and Disentanglement in Deep Representations	❌	❌	✅	✅	❌	❌	✅	3
Experience Selection in Deep Reinforcement Learning for Control	❌	✅	✅	❌	❌	❌	✅	3
Extrapolating Expected Accuracies for Large Multi-Class Problems	❌	✅	✅	✅	❌	❌	✅	4
Fast MCMC Sampling Algorithms on Polytopes	✅	✅	❌	❌	❌	❌	✅	3
Generalized Rank-Breaking: Computational and Statistical Tradeoffs	✅	❌	✅	❌	❌	❌	✅	3
Goodness-of-Fit Tests for Random Partitions via Symmetric Polynomials	❌	❌	❌	❌	❌	❌	✅	1
Gradient Descent Learns Linear Dynamical Systems	✅	❌	❌	❌	❌	❌	✅	2
Harmonic Mean Iteratively Reweighted Least Squares for Low-Rank Matrix Recovery	✅	✅	❌	❌	❌	✅	✅	4
Hinge-Minimax Learner for the Ensemble of Hyperplanes	✅	❌	✅	✅	❌	❌	✅	4
How Deep Are Deep Gaussian Processes?	✅	❌	❌	❌	❌	❌	✅	2
Importance Sampling for Minibatches	✅	❌	✅	✅	❌	❌	✅	4
Improved Asynchronous Parallel Optimization Analysis for Stochastic Incremental Methods	✅	✅	✅	❌	✅	✅	✅	6
Inference via Low-Dimensional Couplings	✅	✅	✅	❌	❌	❌	✅	4
Invariant Models for Causal Transfer Learning	✅	✅	✅	✅	❌	❌	✅	5
Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling	❌	❌	❌	❌	❌	❌	❌	0
Kernel Density Estimation for Dynamical Systems	❌	❌	❌	❌	❌	❌	✅	1
Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions	❌	❌	❌	❌	❌	❌	❌	0
Learning from Comparisons and Choices	❌	✅	✅	✅	❌	❌	✅	4
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning	❌	❌	❌	❌	❌	❌	❌	0
Markov Blanket and Markov Boundary of Multiple Variables	✅	❌	✅	✅	✅	✅	✅	6
Maximum Selection and Sorting with Adversarial Comparators	✅	❌	❌	❌	❌	❌	❌	1
Model-Free Trajectory-based Policy Optimization with Monotonic Improvement	✅	❌	❌	❌	❌	❌	✅	2
Modular Proximal Optimization for Multidimensional Total-Variation Regularization	✅	✅	✅	✅	❌	❌	✅	5
Multivariate Bayesian Structural Time Series Model	✅	❌	❌	❌	❌	❌	✅	2
Numerical Analysis near Singularities in RBF Networks	❌	❌	✅	❌	❌	❌	✅	2
On Generalized Bellman Equations and Temporal-Difference Learning	❌	❌	✅	❌	❌	❌	✅	2
On Semiparametric Exponential Family Graphical Models	✅	❌	✅	❌	❌	❌	✅	3
On Tight Bounds for the Lasso	❌	❌	❌	❌	❌	❌	❌	0
Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator	❌	❌	✅	❌	❌	❌	✅	2
OpenEnsembles: A Python Resource for Ensemble Clustering	✅	✅	✅	❌	❌	❌	✅	4
Optimal Bounds for Johnson-Lindenstrauss Transformations	❌	❌	❌	❌	❌	❌	❌	0
Optimal Quantum Sample Complexity of Learning Algorithms	❌	❌	❌	❌	❌	❌	❌	0
Parallelizing Spectrally Regularized Kernel Algorithms	❌	❌	❌	❌	❌	❌	✅	1
Patchwork Kriging for Large-scale Gaussian Process Regression	✅	❌	✅	✅	✅	❌	✅	5
Profile-Based Bandit with Unknown Profiles	✅	❌	❌	❌	❌	❌	✅	2
RSG: Beating Subgradient Method without Smoothness and Strong Convexity	✅	❌	✅	❌	❌	❌	✅	3
Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem	❌	❌	✅	❌	❌	✅	❌	2
Refining the Confidence Level for Optimistic Bandit Strategies	❌	❌	❌	❌	❌	❌	✅	1
Regularized Optimal Transport and the Rot Mover's Distance	✅	✅	✅	✅	❌	❌	✅	5
Reverse Iterative Volume Sampling for Linear Regression	✅	❌	✅	❌	❌	❌	❌	2
Robust PCA by Manifold Optimization	✅	✅	✅	❌	❌	❌	✅	4
Robust Synthetic Control	✅	❌	✅	✅	❌	❌	✅	4
Scalable Bayes via Barycenter in Wasserstein Space	✅	✅	✅	✅	✅	✅	✅	7
Scaling up Data Augmentation MCMC via Calibration	❌	❌	✅	✅	❌	✅	✅	4
Scikit-Multiflow: A Multi-output Streaming Framework	✅	✅	❌	❌	❌	❌	❌	2
Seglearn: A Python Package for Learning Sequences and Time Series	❌	✅	✅	✅	✅	✅	✅	6
Short-term Sparse Portfolio Optimization Based on Alternating Direction Method of Multipliers	✅	❌	✅	❌	✅	❌	✅	4
Simple Classification Using Binary Data	✅	❌	✅	✅	❌	❌	✅	4
Sparse Estimation in Ising Model via Penalized Monte Carlo Methods	✅	❌	✅	❌	✅	❌	✅	4
State-by-state Minimax Adaptive Estimation for Nonparametric Hidden {M}arkov Models	✅	❌	✅	✅	❌	❌	✅	4
Statistical Analysis and Parameter Selection for Mapper	❌	✅	✅	❌	❌	❌	✅	3
Streaming kernel regression with provably adaptive mean, variance, and regularization	✅	❌	❌	❌	❌	❌	✅	2
The Implicit Bias of Gradient Descent on Separable Data	❌	✅	✅	❌	❌	❌	✅	3
The xyz algorithm for fast interaction search in high-dimensional data	✅	✅	✅	❌	✅	❌	✅	5
Theoretical Analysis of Cross-Validation for Estimating the Risk of the $k$-Nearest Neighbor Classifier	❌	❌	❌	❌	❌	❌	✅	1
ThunderSVM: A Fast SVM Library on GPUs and CPUs	❌	✅	✅	❌	✅	❌	✅	4
Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems	❌	❌	❌	❌	❌	❌	❌	0
Using Side Information to Reliably Learn Low-Rank Matrices from Missing and Corrupted Observations	✅	❌	✅	✅	❌	❌	✅	4