Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Epistemic Uncertainty Estimation in Regression Ensemble Models with Pairwise Epistemic Estimators

Authors: Lucas Berry, David Meger

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data, Pendulum, Hopper, Ant, and Humanoid, demonstrating Pair Ep Ests advantage over baselines in high-dimensional regression active learning. [...] 5 Experimental Results
Researcher Affiliation	Academia	Lucas Berry, David Meger Department of Computer Science Mc Gill University EMAIL
Pseudocode	Yes	Algorithm 1 Active Learning Using Pair Ep Ests
Open Source Code	Yes	Our code can be found at https://github.com/nwaftp23/pairflow-uncertainty.
Open Datasets	Yes	To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data, Pendulum, Hopper, Ant, and Humanoid, demonstrating Pair Ep Ests advantage over baselines in high-dimensional regression active learning. [...] Notably, the Open AI Gym library was utilized, with minor modifications [6].
Dataset Splits	Yes	The training set is initialized with 100 or 200 data points for 1D and multi-dimensional environments, respectively. In each acquisition batch, 10 points are added. [...] The training sets were obtained by applying a random policy, while the test sets were generated using an expert policy. This methodology was employed to ensure diversity between the training and test datasets.
Hardware Specification	Yes	Training was conducted using 16GB RAM on Intel Gold 6148 Skylake @ 2.4 GHz CPUs and NVidia V100SXM2 (16G memory) GPUs.
Software Dependencies	No	The nflows library [18] was employed with minor modifications.
Experiment Setup	Yes	The Nflows Base model employed one nonlinear transformation, g, with a single hidden layer containing 20 units, utilizing cubic spline flows as per [17]. The base network consisted of two hidden layers, each comprising 40 units with ReLU activation functions. It is important to note that all base distributions were Gaussian. The PNEs adopted an architecture of three hidden layers each with 50 units and ReLU activation functions. [...] Each member is initialized with different random weights, trained on a bootstrapped subset of the data, and assigned a fixed dropout mask sampled at the start of training (with p = 0.5), similar to [1]. [...] For each experimental setting, PNEs and Nflows Base were executed with five ensemble components. The MC estimator sampled 1000 and 5000 points for Nflows Base and PNEs, respectively, for each x conditioned on.