Autoregressive Quantile Flows for Predictive Uncertainty Estimation
Authors: Phillip Si, Allan Bishop, Volodymyr Kuleshov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate quantile flow regression and its extensions against several baselines, including Mixture Density Networks (MDN; Bishop (1994)), Gaussian regression, Quantile regression (QRO; in which we fit a separate estimator for 99 quantiles indexed by α = 0.01, ..., 0.99). We report calibration errors (as in Kuleshov et al. (2018)), check scores (CHK), the CRPS (as in Gneiting and Raftery (2007)) and L1 loss (MAE); both calibration error (defined in appendix) and check score are indexed by α = 0.01, ..., 0.99. Additional experiments on synthetic datasets can be found in the appendix. |
| Researcher Affiliation | Academia | Phillip Si, Allan Bishop, Volodymyr Kuleshov Department of Computer Science, Cornell Tech and Cornell University {ps789, adb262, vk379}@cornell.edu |
| Pseudocode | Yes | See our time series experiments for details, as well as the appendix for pseudocode of the sampling algorithm. |
| Open Source Code | No | The paper mentions using 'publicly available code' from Mashlakov et al. (2021) for their experimental setup, but it does not provide a statement or link for the source code specific to the methodology described in *this* paper. |
| Open Datasets | Yes | We used four benchmark UCI regression datasets (Dua and Graff, 2017) varying in size from 308-1030 instances and 6-10 continuous features. ... We also conducted bounding box retrieval experiments against Gaussian baselines on the data corpus VOC 2007, obtained from Everingham et al. ... We use the 2011-2014 Electricity Load dataset, a popular dataset for benchmarking time series models. ... generating images of digits from sklearn. |
| Dataset Splits | No | The paper states 'We randomly hold out 25% of data for testing' for UCI datasets, which defines a training and test split. While it later refers to a 'validation set' in the appendix in the context of a metric definition, it does not provide specific details on how this validation set is created or its size for the general experiments, nor does it specify a comprehensive train/validation/test split across all experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU models, memory, cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions the use of certain models and architectures (e.g., LSTM, FC layer, ReLU activation), but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific solver versions). |
| Experiment Setup | Yes | QFR and CDFR were trained with learning rates and dropout rates of 3e 3, 3e 4 and (0.2, 0.1) respectively as described in Section 5.2. All of the models were two-hidden-layer neural networks with hidden layer size 64, and Re LU activation functions. ... The original Gaussian method was trained for a total of 60 epochs, with a learning rate of 2e-4 which is decremented by a factor of 10 at epochs 30 and 45. |