Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
Authors: Alix LHERITIER, Frederic Cazals
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Alix Lhéritier Amadeus SAS F-06902 Sophia-Antipolis, France EMAIL Frédéric Cazals Université Côte d Azur Inria F-06902 Sophia-Antipolis, France EMAIL |
| Pseudocode | No | Section 5 'Online algorithm' describes the algorithm's steps in prose, stating 'The steps of our algorithm are the same as those of [33, Algorithm 1]'. However, it does not present a structured pseudocode block or algorithm box within the paper itself. |
| Open Source Code | Yes | Python code and data used for the experiments are available at https: //github.com/alherit/kd-switch. |
| Open Datasets | Yes | We use the following datasets, detailed in Appendix B.1: (L-i) A 2D dataset consists of two Gaussian Mixtures spanning three different scales. (L-ii) A dataset in dimension d = 784 composed of both real MNIST digits, as well as digits generated by a Generative Adversarial Network [24] trained on the MNIST dataset. (L-iii) The Higgs dataset [20], the goal being to distinguish the signature of processes producing Higgs bosons. (L-iv) The Breast Cancer Wisconsin (Diagnostic) Data Set [20] dimension d = 30. |
| Dataset Splits | No | The paper describes how data is fed to online predictors and mentions a 'train-test paradigm' in the context of comparing with other methods (e.g., 'the train-test paradigm as opposed to KDS-seq which automatically detects the pertinent scales'), but it does not specify explicit train/validation/test splits, percentages, or cross-validation details for its own experiments. |
| Hardware Specification | Yes | Experiments were carried out on a machine running Debian 3.16, equipped with two Intel(R) Xeon(R) E5-2667 v2 @ 3.30GHz processors and 62 GB of RAM. |
| Software Dependencies | No | The paper mentions 'Python code' and states 'Our implementation uses the scikitlearn Gaussian Process Classifier [23]', but it does not specify version numbers for Python or scikit-learn. |
| Experiment Setup | Yes | We compare the performance of our online predictors Pkds and Pkdw (see Rmk. 2) with a number of trees J {1, 50}... The Bayesian Mixture of Gaussian Processes Classifiers (gp) with RBF kernel width σ {24i}i= 5...7... The significance level is set to α = .01 in all the cases. |