Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Knowledge Distillation of Uncertainty using Deep Latent Factor Model
Authors: Sehyun Park, Jongjin Lee, Yunseop Shin, Ilsang Ohn, Yongdai Kim
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we investigate Gaussian distillation by analyzing multiple benchmark datasets. We compare Gaussian distillation with existing baselines including the naive distillation (one-to-one distillation without sharing weights between student DNNs, small-Ens), Hydra [11] and BE [12] for regression and classification problems as well as fine-tuning of language models in view of uncertainty quantification. For classification, we also evaluate Proxy Dirichlet Distillation (Proxy-End2) [18] and Ensemble Distillation via Flow Matching (EDFM) [45]. In addition, we show that a pre-trained DLF outperforms its competitors for distribution shift problems. |
| Researcher Affiliation | Collaboration | Sehyun Park Department of Statistics Seoul National University EMAIL Jongjin Lee Samsung Research EMAIL Yunseop Shin Department of Statistics Seoul National University EMAIL Ilsang Ohn Department of Statistics Inha University EMAIL Yongdai Kim Department of Statistics Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1: EM algorithm for the univariate DLF model |
| Open Source Code | Yes | 2The source code of DLF is publicly available at https://github.com/sehyun1094/DLF |
| Open Datasets | Yes | Datasets We analyze six benchmark datasets from the UCI repository [46] including Boston housing, Concrete, Energy, Wine, Power Plant, and Kin8nm. Datasets CIFAR-10 and CIFAR-100 consist of 50,000 training and 10,000 test images. Datasets We analyze three GLUE [50] and Super GLUE [51] sub-tasks: RTE, MRPC, and Wi C. |
| Dataset Splits | Yes | Each dataset is randomly split into 90% training and 10% testing... In this experiment, the training data are further split into 80% training and 20% validation... The entire dataset is partitioned into three disjoint subsets: D = Dtrain teacher Dtrain new Dtest, with a fixed ratio of 4.5 : 4.5 : 1. |
| Hardware Specification | Yes | All our experiments are done through Python 3.9.16 with Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz, NVIDIA TITAN Xp GPU and 128GB RAM. |
| Software Dependencies | Yes | All our experiments are done through Python 3.9.16 with Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz, NVIDIA TITAN Xp GPU and 128GB RAM. The Adam [53] is used for the optimization. |
| Experiment Setup | Yes | We obtain 50 teacher models of DNNs with two hidden layers and 100 nodes at each layer... The architecture of student models comprises of an one-hidden-layer MLP with 50 units. Training lasts 200 epochs on a single GPU using SGD with Nesterov momentum of 0.9, weight decay of 5e-4, and batch size of 128. A one-cycle cosine annealing schedule with a five-epoch linear warm-up (from 0.001 to 0.1) is employed. |