Fast Nonlinear Vector Quantile Regression

Authors: Aviv A. Rosenberg, Sanketh Vedula, Yaniv Romano, Alexander Bronstein

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use four synthetic and four real datasets which are detailed in Appendices D.1 and D.2. Except for the MVN dataset, which was used for the scale and optimization experiments, the remaining three synthetic datasets were carefully selected to be challenging since they exhibit complex nonlinear relationships between X and Y (see e.g. fig. 1b). We evaluate using the following metrics (detailed in Appendix E): (i) KDE-L1, an estimate of distance between distributions; (ii) QFD, a distance measured between an estimated CVQF and its ground truth; (iii) Inverse CVQF entropy; (iv) Monotonicity violations; (v) Marginal coverage; (vi) Size of α-confidence set.
Researcher Affiliation Collaboration Aviv A. Rosenberg1,3, , Sanketh Vedula1,3, , Yaniv Romano1,2, and Alex M. Bronstein1,3 1Department of Computer Science, Technion 2Department of Electrical and Computer Engineering, Technion 3Sibylla, UK
Pseudocode No The paper describes methods and procedures in text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We release a feature-rich, well-tested python package vqr1, implementing estimation of vector quantiles, vector ranks, vector quantile contours, linear and nonlinear VQR, and VMR. To the best of our knowledge, this would be the first publicly available tool for estimating conditional vector quantile functions at scale.1can be installed with pip install vqr; source available at https://github.com/vistalab-technion/vqr
Open Datasets Yes We use four synthetic and four real datasets which are detailed in Appendices D.1 and D.2. The original real datasets contained one-dimensional targets. Feldman et al. (2021) constructed an additional target variable by selecting a feature that has high correlation with first target variable and small correlation to the other input features, so that it is hard to predict. Summary of these datasets is presented in table A1.
Dataset Splits Yes In all the real data experiments, we randomly split the data into 80% training set and 20% hold-out test set.
Hardware Specification Yes All experiments were run on a machine with an Intel Xeon E5 CPU, 256GB of RAM and an Nvidia Titan 2080Ti GPU with 11GB dedicated graphics memory.
Software Dependencies No The paper mentions using Python packages like 'vqr', 'scipy' with 'qhull', 'pykeops', and 'POT library', but it does not specify exact version numbers for these software dependencies.
Experiment Setup Yes Synthetic glasses experiment: We set N = 10k, T = 100, and ε = 0.001. We optimized both VQR and NL-VQR for 40k iterations and use a learning rate scheduler that decays the learning rate by a factor of 0.9 every 500 iterations if the error does not drop by 0.5%. Conditional Banana and Rotating Star experiments: We set ε = 0.005 and optimized both VQR and NL-VQR for 20k iterations. We used the same learning rate and schedulers as in the synthetic glasses experiment. Real data experiments: All methods were run for 40k iterations, with learning rate set to 0.3 and ε = 0.01. We set T = 50 for NL-VQR and VQR baselines, and T = 100 for separable linear and nonlinear QR baselines.