Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Consistent Interpolating Ensembles via the Manifold-Hilbert Kernel
Authors: Yutong Wang, Clay Scott
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Recent research in the theory of overparametrized learning has sought to establish generalization guarantees in the interpolating regime. Such results have been established for a few common classes of methods, but so far not for ensemble methods. We devise an ensemble classification method that simultaneously interpolates the training data, and is consistent for a broad class of data distributions. To this end, we define the manifold-Hilbert kernel for data distributed on a Riemannian manifold. We prove that kernel smoothing regression and classification using the manifold-Hilbert kernel are weakly consistent in the setting of Devroye et al. [22]. For the sphere, we show that the manifold-Hilbert kernel can be realized as a weighted random partition kernel, which arises as an infinite ensemble of partition-based classifiers. |
| Researcher Affiliation | Academia | Yutong Wang University of Michigan EMAIL Clayton D. Scott University of Michigan EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. In the 'If you ran experiments...' section, all items related to code and data are marked as N/A. |
| Open Datasets | No | The paper is theoretical and does not mention the use of any specific public or open dataset for training experiments. The 'If you ran experiments...' section states N/A for experimental details. |
| Dataset Splits | No | The paper is theoretical and does not provide specific dataset split information (e.g., percentages, sample counts) for training, validation, or test sets. The 'If you ran experiments...' section states N/A for experimental details. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used to run experiments. The 'If you ran experiments...' section states N/A for experimental details like compute resources. |
| Software Dependencies | No | The paper is theoretical and does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate experiments. The 'If you ran experiments...' section states N/A for experimental details. |
| Experiment Setup | No | The paper is theoretical and does not provide specific details about an experimental setup, such as hyperparameters or system-level training settings. The 'If you ran experiments...' section states N/A for experimental details like training details. |