reproducibilityindex.ai

Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise

Authors: Thomas Pouplin, Alan Jeffares, Nabeel Seedat, Mihaela Van Der Schaar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that it results in superior performance to existing methods when evaluated on standard benchmarks (Section 4). We begin by empirically verifying the predicted gains of the RQR objective over quantile regression on non-symmetric noise distributions. As the noise distribution cannot be known on real-world data, here we generate synthetic data according to a known process. The data is generated according to a data-generating process in which the label is determined according to Y = X + , where X R represents a deterministic component which is set as constant and R is the noise component. Then the noise distribution is selected as either a (symmetric) Gaussian or a (non-symmetric) truncated Gaussian. We fit a simple linear neural network consisting of just a single layer. As illustrated in Table 2, the empirical results match our theoretical expectations. All methods perform equally well on the symmetric Gaussian noise where intervals centered at the median are optimal. However, QR fails to achieve optimal width on the truncated Gaussian due to being arbitrarily centered at the median. Whilst RQR (w/o reg) is unbiased, it is not sufficiently incentivized to produce narrower intervals and thus performs similarly to QR in that regard. In line with our analytic observations that motivated this objective, only RQR-W (with reg) achieves the optimally narrow interval solution.
Researcher Affiliation	Academia	1Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK. Correspondence to: Thomas Pouplin <tp531@cam.ac.uk>.
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found. The methods are described mathematically and textually.
Open Source Code	Yes	Code at https://github.com/TPouplin/RQR.
Open Datasets	Yes	We follow standard preprocessing on all datasets with features standardized such that they have zero mean and unit variance and targets divided by their mean. All experiments are repeated over 10 random seeds with means and standard errors of means reported throughout. We train two-layer neural networks with 64 hidden units and Re LU activations throughout (consistent with Feldman et al. (2021)). We also use the Adam optimizer (Kingma & Ba, 2014). In the benchmarking of RQR-W, we applied the following experimental design. A training-validation-testing split with a ratio [0.6,0.2,0.2] is applied. Then we perform a hyperparameter grid search for all methods. Each combination is first fit to the training data with the model selection based on evaluations on the validation set. For a given hyperparameter combination, the best model is selected based on the epoch that achieves the best interval length (such that target coverage is achieved). The reported values are the evaluations on the test set. All methods evaluated follow this protocol. The grid search considers the following hyperparameters: dropout probability {0.1, 0.2, 0.3}, learning rate [0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001], and regularization coefficient [0.01, 0.1, 1, 5, 10, 20, 30, 40, 50]. Other hyperparameters are fixed: number of epochs 400, batch size 10000. In the benchmarking of RQR-O, we followed the exact procedure and used the implementation of the OQR baseline in Feldman et al. (2021). In what follows, we will describe this procedure. A training-validation-testing split with a ratio of 0.4 for testing and a further split of 0.9-0.1 for training-validation is applied. The default hyperparameters are used for both methods with only the regularization coefficient tuned. Specifically, it is set to 1 and then decreased in increments following [1, 0.5, 0.1, 0.05, . . .] until the desired coverage is achieved. Both the RQR-O and OQR hyperparameters are learning rate: 1e-3, maximum number of epochs: 10000, dropout probability: 0, and batch size: 1024. Early stopping patience is set to 200 epochs.
Dataset Splits	Yes	A training-validation-testing split with a ratio [0.6,0.2,0.2] is applied. Then we perform a hyperparameter grid search for all methods. Each combination is first fit to the training data with the model selection based on evaluations on the validation set. For a given hyperparameter combination, the best model is selected based on the epoch that achieves the best interval length (such that target coverage is achieved). The reported values are the evaluations on the test set. All methods evaluated follow this protocol. The grid search considers the following hyperparameters: dropout probability {0.1, 0.2, 0.3}, learning rate [0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001], and regularization coefficient [0.01, 0.1, 1, 5, 10, 20, 30, 40, 50]. Other hyperparameters are fixed: number of epochs 400, batch size 10000. In the benchmarking of RQR-O, we followed the exact procedure and used the implementation of the OQR baseline in Feldman et al. (2021). In what follows, we will describe this procedure. A training-validation-testing split with a ratio of 0.4 for testing and a further split of 0.9-0.1 for training-validation is applied. The default hyperparameters are used for both methods with only the regularization coefficient tuned. Specifically, it is set to 1 and then decreased in increments following [1, 0.5, 0.1, 0.05, . . .] until the desired coverage is achieved. Both the RQR-O and OQR hyperparameters are learning rate: 1e-3, maximum number of epochs: 10000, dropout probability: 0, and batch size: 1024. Early stopping patience is set to 200 epochs.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) were explicitly mentioned for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers were explicitly mentioned. The paper mentions using 'Adam optimizer' and 'neural networks' but does not specify library versions (e.g., PyTorch, TensorFlow) or their versions.
Experiment Setup	Yes	We follow standard preprocessing on all datasets with features standardized such that they have zero mean and unit variance and targets divided by their mean. All experiments are repeated over 10 random seeds with means and standard errors of means reported throughout. We train two-layer neural networks with 64 hidden units and Re LU activations throughout (consistent with Feldman et al. (2021)). We also use the Adam optimizer (Kingma & Ba, 2014). In the benchmarking of RQR-W, we applied the following experimental design. A training-validation-testing split with a ratio [0.6,0.2,0.2] is applied. Then we perform a hyperparameter grid search for all methods. Each combination is first fit to the training data with the model selection based on evaluations on the validation set. For a given hyperparameter combination, the best model is selected based on the epoch that achieves the best interval length (such that target coverage is achieved). The reported values are the evaluations on the test set. All methods evaluated follow this protocol. The grid search considers the following hyperparameters: dropout probability {0.1, 0.2, 0.3}, learning rate [0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001], and regularization coefficient [0.01, 0.1, 1, 5, 10, 20, 30, 40, 50]. Other hyperparameters are fixed: number of epochs 400, batch size 10000.