reproducibilityindex.ai

Autoregressive Quantile Flows for Predictive Uncertainty Estimation

Authors: Phillip Si, Allan Bishop, Volodymyr Kuleshov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate quantile ﬂow regression and its extensions against several baselines, including Mixture Density Networks (MDN; Bishop (1994)), Gaussian regression, Quantile regression (QRO; in which we ﬁt a separate estimator for 99 quantiles indexed by α = 0.01, ..., 0.99). We report calibration errors (as in Kuleshov et al. (2018)), check scores (CHK), the CRPS (as in Gneiting and Raftery (2007)) and L1 loss (MAE); both calibration error (deﬁned in appendix) and check score are indexed by α = 0.01, ..., 0.99. Additional experiments on synthetic datasets can be found in the appendix.
Researcher Affiliation	Academia	Phillip Si, Allan Bishop, Volodymyr Kuleshov Department of Computer Science, Cornell Tech and Cornell University {ps789, adb262, vk379}@cornell.edu
Pseudocode	Yes	See our time series experiments for details, as well as the appendix for pseudocode of the sampling algorithm.
Open Source Code	No	The paper mentions using 'publicly available code' from Mashlakov et al. (2021) for their experimental setup, but it does not provide a statement or link for the source code specific to the methodology described in this paper.
Open Datasets	Yes	We used four benchmark UCI regression datasets (Dua and Graff, 2017) varying in size from 308-1030 instances and 6-10 continuous features. ... We also conducted bounding box retrieval experiments against Gaussian baselines on the data corpus VOC 2007, obtained from Everingham et al. ... We use the 2011-2014 Electricity Load dataset, a popular dataset for benchmarking time series models. ... generating images of digits from sklearn.
Dataset Splits	No	The paper states 'We randomly hold out 25% of data for testing' for UCI datasets, which defines a training and test split. While it later refers to a 'validation set' in the appendix in the context of a metric definition, it does not provide specific details on how this validation set is created or its size for the general experiments, nor does it specify a comprehensive train/validation/test split across all experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU models, memory, cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions the use of certain models and architectures (e.g., LSTM, FC layer, ReLU activation), but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions).
Experiment Setup	Yes	QFR and CDFR were trained with learning rates and dropout rates of 3e 3, 3e 4 and (0.2, 0.1) respectively as described in Section 5.2. All of the models were two-hidden-layer neural networks with hidden layer size 64, and Re LU activation functions. ... The original Gaussian method was trained for a total of 60 epochs, with a learning rate of 2e-4 which is decremented by a factor of 10 at epochs 30 and 45.