Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Journal of Machine Learning Research (JMLR) - 2017

Documentation Rate of Empirical Papers by Reproducibility Variable

Distribution of Empirical Papers by Number of Documented Variables

Website:

Venue Year Papers
Reproducibility Score Reproducibility Score based on Gundersen et al. (2025). See Methods for details.
Documentation Score Documentation Score is the average score over the seven reproducibility variables for empirical research papers. See Methods for details.
% Empirical Percentage of papers that are empirical research vs theoretical research.
% Industry Percentage of empirical research papers with at least one author from Industry.
Website
JMLR 2017 234 0.42 3.57 83.33% 21.03%
Pseudocode
Open Source Code
Open Datasets
Dataset Splits
Hardware Specification
Software Dependencies
Experiment Setup
A Bayesian Framework for Learning Rule Sets for Interpretable Classification 4
A Bayesian Mixed-Effects Model to Learn Trajectories of Changes from Repeated Manifold-Valued Observations 3
A Cluster Elastic Net for Multivariate Regression 6
A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization 3
A Nonconvex Approach for Phase Retrieval: Reshaped Wirtinger Flow and Incremental Algorithms 4
A Robust-Equitable Measure for Feature Ranking and Selection 3
A Spectral Algorithm for Inference in Hidden semi-Markov Models 4
A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning 3
A Survey of Preference-Based Reinforcement Learning Methods 1
A Theory of Learning with Corrupted Labels 0
A Tight Bound of Hard Thresholding 4
A Unified Formulation and Fast Accelerated Proximal Gradient Method for Classification 6
A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation 5
A distributed block coordinate descent method for training l1 regularized linear classifiers 4
A survey of Algorithms and Analysis for Adaptive Online Learning 1
Accelerating Stochastic Composition Optimization 2
Achieving Optimal Misclassification Proportion in Stochastic Block Models 2
Active Nearest-Neighbor Learning in Metric Spaces 1
Active-set Methods for Submodular Minimization Problems 4
Adaptive Randomized Dimension Reduction on Massive Data 4
An $\ell_{\infty}$ Eigenvector Perturbation Bound and Its Application 1
An Easy-to-hard Learning Paradigm for Multiple Classes and Multiple Labels 5
An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback 1
Analyzing Tensor Power Method Dynamics in Overcomplete Regime 1
Angle-based Multicategory Distance-weighted SVM 4
Approximation Vector Machines for Large-scale Online Learning 6
Asymptotic Analysis of Objectives Based on Fisher Information in Active Learning 1
Asymptotic behavior of Support Vector Machine for spiked population model 3
Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA 4
Automatic Differentiation Variational Inference 6
Automatic Differentiation in Machine Learning: a Survey 3
Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization 0
Averaged Collapsed Variational Bayes Inference 3
Bayesian Inference for Spatio-temporal Spike-and-Slab Priors 6
Bayesian Learning of Dynamic Multilayer Networks 4
Bayesian Network Learning via Topological Order 5
Bayesian Tensor Regression 5
Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits 1
Breaking the Curse of Dimensionality with Convex Neural Networks 1
Bridging Supervised Learning and Test-Based Co-optimization 0
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Evolution 5
Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice 5
Certifiably Optimal Low Rank Factor Analysis 4
Characteristic and Universal Tensor Product Kernels 0
Classification of Time Sequences using Graphs of Temporal Constraints 5
Clustering from General Pairwise Observations with Applications to Time-varying Graphs 3
Clustering with Hidden Markov Model on Variable Blocks 5
CoCoA: A General Framework for Communication-Efficient Distributed Optimization 6
Communication-efficient Sparse Regression 1
Community Detection and Stochastic Block Models: Recent Developments 2
Community Extraction in Multilayer Networks with Heterogeneous Community Structure 4
Compact Convex Projections 4
Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs 1
Computational Limits of A Distributed Algorithm for Smoothing Spline 0
Concentration inequalities for empirical processes of linear time series 0
Confidence Sets with Expected Sizes for Multiclass Classification 3
Consistency, Breakdown Robustness, and Algorithms for Robust Improper Maximum Likelihood Clustering 4
Convergence Analysis of Distributed Inference with Vector-Valued Gaussian Belief Propagation 0
Convergence of Unregularized Online Learning Algorithms 0
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding 2
Cost-Sensitive Learning with Noisy Labels 4
Deep Learning the Ising Model Near Criticality 1
Dense Distributions from Sparse Samples: Improved Gibbs Sampling Parameter Estimators for LDA 5
Density Estimation in Infinite Dimensional Exponential Families 1
Differential Privacy for Bayesian Inference through Posterior Sampling 1
Dimension Estimation Using Random Connection Models 2
Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server 6
Distributed Learning with Regularized Least Squares 0
Distributed Semi-supervised Learning with Kernel Ridge Regression 3
Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks 1
Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement 4
Divide-and-Conquer for Debiased $l_1$-norm Support Vector Machine in Ultra-high Dimensions 1
Document Neural Autoregressive Distribution Estimation 4
Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity 5
Efficient Sampling from Time-Varying Log-Concave Distributions 0
Empirical Evaluation of Resampling Procedures for Optimising SVM Hyperparameters 4
Enhancing Identification of Causal Effects by Pruning 2
Estimation of Graphical Models through Structured Norm Minimization 6
Exact Learning of Lightweight Description Logic Ontologies 1
Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers 4
Faithfulness of Probability Distributions and Graphs 0
Fisher Consistency for Prior Probability Shift 3
Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities 2
From Predictive Methods to Missing Data Imputation: An Optimization Approach 5
Fundamental Conditions for Low-CP-Rank Tensor Completion 0
GFA: Exploratory Analysis of Multiple Data Sources with Group Factor Analysis 3
GPflow: A Gaussian Process Library using TensorFlow 3
Gap Safe Screening Rules for Sparsity Enforcing Penalties 5
Gaussian Lower Bound for the Information Bottleneck Limit 3
Generalized Conditional Gradient for Sparse Estimation 6
Generalized P{\'o}lya Urn for Time-Varying Pitman-Yor Processes 4
Generalized SURE for optimal shrinkage of singular values in low-rank matrix denoising 2
Gradient Estimation with Simultaneous Perturbation and Compressive Sensing 4
Gradient Hard Thresholding Pursuit 6
Group Sparse Optimization via lp,q Regularization 6
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression 1
Hierarchical Clustering via Spreading Metrics 4
Hierarchically Compositional Kernels for Scalable Nonparametric Learning 5
Hinge-Loss Markov Random Fields and Probabilistic Soft Logic 7
HyperTools: a Python Toolbox for Gaining Geometric Insights into High-Dimensional Data 4
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization 5
Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks 3
Identifying a Minimal Class of Models for High--dimensional Data 2
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning 1
Improved spectral community detection in large heterogeneous networks 4
Improving Variational Methods via Pairwise Linear Response Identities 3
In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics 2
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles 3
Interactive Algorithms: Pool, Stream and Precognitive Stream 1
JSAT: Java Statistical Analysis Tool, a Library for Machine Learning 4
Joint Label Inference in Networks 4
KELP: a Kernel-based Learning Platform 2
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods 3
Kernel Method for Persistence Diagrams via Kernel Embedding and Weight Factor 4
Kernel Partial Least Squares for Stationary Data 2
Knowledge Graph Completion via Complex Tensor Factorization 6
Learning Certifiably Optimal Rule Lists for Categorical Data 6
Learning Instrumental Variables with Structural and Non-Gaussianity Assumptions 4
Learning Local Dependence In Ordered Data 5
Learning Partial Policies to Speedup MDP Tree Search via Reduction to I.I.D. Learning 2
Learning Quadratic Variance Function (QVF) DAG Models via OverDispersion Scoring (ODS) 4
Learning Scalable Deep Kernels with Recurrent Structure 5
Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network 1
Lens Depth Function and k-Relative Neighborhood Graph: Versatile Tools for Ordinal Data Analysis 6
Local Identifiability of $\ell_1$-minimization Dictionary Learning: a Sufficient and Almost Necessary Condition 2
Local algorithms for interactive clustering 2
Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research 1
Making Decision Trees Feasible in Ultrahigh Feature and Label Dimensions 4
Matrix Completion with Noisy Entries and Outliers 4
Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard 0
Maximum Principle Based Algorithms for Deep Learning 4
Memory Efficient Kernel Approximation 6
Minimax Estimation of Kernel Mean Embeddings 0
Minimax Filter: Learning to Preserve Privacy from Inference Attacks 5
Mode-Seeking Clustering and Density Ridge Estimation via Direct Estimation of Density-Derivative-Ratios 4
Multiscale Strategies for Computing Optimal Transport 4
Nearly optimal classification for semimetrics 2
Non-parametric Policy Search with Limited Information Loss 3
Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization 4
Nonparametric Risk Bounds for Time-Series Forecasting 2
Normal Bandits of Unknown Means and Variances 2
On $b$-bit Min-wise Hashing for Large-scale Regression and Classification with Sparse Data 2
On Binary Embedding using Circulant Matrices 4
On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models 2
On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization 1
On Markov chain Monte Carlo methods for tall data 5
On Perturbed Proximal Gradient Algorithms 2
On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness 1
On the Consistency of Ordinal Regression Methods 2
On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions 2
On the Propagation of Low-Rate Measurement Error to Subgraph Counts in Large Networks 0
On the Stability of Feature Selection Algorithms 4
Online Bayesian Passive-Aggressive Learning 5
Online Learning to Rank with Top-k Feedback 3
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling 3
Optimal Dictionary for Least Squares Representation 2
Optimal Rates for Multi-pass Stochastic Gradient Methods 3
POMDPs.jl: A Framework for Sequential Decision Making under Uncertainty 1
Parallel Symmetric Class Expression Learning 4
Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification 2
Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models 5
Perishability of Data: Dynamic Pricing under Varying-Coefficient Models 2
Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression 3
Persistence Images: A Stable Vector Representation of Persistent Homology 4
Poisson Random Fields for Dynamic Feature Models 4
Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models 2
Preference-based Teaching 0
Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model 5
Probabilistic Line Searches for Stochastic Optimization 5
Probabilistic preference learning with the Mallows rank model 4
Provably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data 3
Pycobra: A Python Toolbox for Ensemble Learning and Visualisation 2
Quantifying the Informativeness of Similarity Measurements 7
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations 6
Rank Determination for Low-Rank Data Completion 0
Rate of Convergence of $k$-Nearest-Neighbor Classification Rule 0
Reconstructing Undirected Graphs from Eigenspaces 3
Recovering PCA and Sparse PCA via Hybrid-(l1,l2) Sparse Sampling of Data Elements 3
Refinery: An Open Source Topic Modeling Web Platform 1
Regularization and the small-ball method II: complexity dependent error rates 0
Regularized Estimation and Testing for High-Dimensional Multi-Block Vector-Autoregressive Models 3
Relational Reinforcement Learning for Planning with Exogenous Effects 3
Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks 4
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria 2
Robust Discriminative Clustering with Sparse Regularizers 4
Robust Topological Inference: Distance To a Measure and Kernel Distance 2
Robust and Scalable Bayes via a Median of Subset Posterior Measures 4
SGDLibrary: A MATLAB library for stochastic optimization algorithms 3
STORE: Sparse Tensor Response Regression and Neuroimaging Analysis 4
Saturating Splines and Feature Selection 5
Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks 5
Second-Order Stochastic Optimization for Machine Learning in Linear Time 4
Sharp Oracle Inequalities for Square Root Regularization 3
Significance-based community detection in weighted networks 4
Simple, Robust and Optimal Ranking from Pairwise Comparisons 1
Simplifying Probabilistic Expressions in Causal Inference 2
Simultaneous Clustering and Estimation of Heterogeneous Graphical Models 4
Sketched Ridge Regression: Optimization Perspective, Statistical Perspective, and Model Averaging 4
SnapVX: A Network-Based Convex Optimization Solver 3
Soft Margin Support Vector Classification as Buffered Probability Minimization 1
Sparse Concordance-assisted Learning for Optimal Treatment Decision 4
Sparse Exchangeable Graphs and Their Limits via Graphon Processes 0
Spectral Clustering Based on Local PCA 4
Stability of Controllers for Gaussian Process Dynamics 3
Stabilized Sparse Online Learning for Sparse Data 4
Statistical Inference on Random Dot Product Graphs: a Survey 4
Statistical Inference with Unnormalized Discrete Models and Localized Homogeneous Divergences 3
Statistical and Computational Guarantees for the Baum-Welch Algorithm 1
Steering Social Activity: A Stochastic Optimal Control Point Of View 4
Stochastic Gradient Descent as Approximate Bayesian Inference 4
Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization 3
Submatrix localization via message passing 1
Surprising properties of dropout in deep networks 2
Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks 5
Tests of Mutual or Serial Independence of Random Vectors with Applications 5
The DFS Fused Lasso: Linear-Time Denoising over General Graphs 3
The Impact of Random Models on Clustering Similarity 3
The MADP Toolbox: An Open Source Library for Planning and Learning in (Multi-)Agent Systems 1
The Search Problem in Mixture Models 3
Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis 4
Time-Accuracy Tradeoffs in Kernel Prediction: Controlling Prediction Quality 5
To Tune or Not to Tune the Number of Trees in Random Forest 4
Training Gaussian Mixture Models at Scale via Coresets 4
Two New Approaches to Compressed Sensing Exhibiting Both Robust Sparse Recovery and the Grouping Effect 2
Uncovering Causality from Multivariate Hawkes Integrated Cumulants 5
Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques 5
Using Conceptors to Manage Neural Long-Term Memories for Temporal Patterns 3
Variational Fourier Features for Gaussian Processes 5
Variational Particle Approximations 4
Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning 3
auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks 3
openXBOW -- Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit 5
pomegranate: Fast and Flexible Probabilistic Modeling in Python 3
tick: a Python Library for Statistical Learning, with an emphasis on Hawkes Processes and Time-Dependent Models 3