Neural Conditional Probability for Uncertainty Quantification

Authors: Vladimir Kostic, Grégoire Pacreau, Giacomo Turri, Pietro Novelli, Karim Lounici, Massimiliano Pontil

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that NCP with a 2-hidden-layer network matches or outperforms leading methods. This demonstrates that a a minimalistic architecture with a theoretically grounded loss can achieve competitive results, even in the face of more complex architectures.
Researcher Affiliation Academia 1CSML, Istituto Italiano di Tecnologia 2University of Novi Sad 3CMAP-Ecole Polytechnique 4AI Centre, University College London
Pseudocode Yes Algorithm 1 Separable density learning procedure
Open Source Code Yes Code is available at https://github.com/CSML-IIT-UCL/NCP.
Open Datasets Yes To sample data from Econ Density, Arma Jump, Gaussian Mixture, and Skew Normal, we used the library Conditional Density Estimation (Rothfuss et al., 2019) available at https://github.com/freelunchtheorem/Conditional_Density_Estimation. and Student Performance dataset available at https://www.kaggle.com/datasets/ nikhil7280/student-performance-multiple-linear-regression/data.
Dataset Splits Yes ranging from 102 to 105, with a validation set of 103 samples.
Hardware Specification Yes Experiments were conducted on a high-performance computing cluster equipped with an Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz Sky Lake CPU, 377GB RAM, and an NVIDIA Tesla V100 16Gb GPU.
Software Dependencies No The paper mentions various software components and libraries (e.g., 'normflows', 'rfcde library'), but does not provide a comprehensive list of all key software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA versions) required for reproduction of their own implementation.
Experiment Setup Yes We trained an NCP model with uθ and vθ as multi-layer perceptrons (MLPs), each having two hidden layers of 64 units using GELU activation function in between. The vector σθ has a size of d = 100, and γ is set to 10 3. Optimization was performed over 104 epochs using the Adam optimizer with a learning rate of 10 3. Early stopping was applied based on the validation set with patience of 1000 epochs.