Improving Regression Performance with Distributional Losses

Authors: Ehsan Imani, Martha White

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we investigate the utility of the HL-Gaussian for regression, compared to using an 2 loss. We particularly investigate why the modification to this distributional loss improves performance, designing experiments to test if it is due to (a) the utility of learning distributions or smoothed targets, (b) a bias-variance trade-off from bin size or variance in the HL-Gaussian, (c) an improved representation, (d) nonlinearity introduced by the HL and (e) improved optimization properties of the loss. Datasets and pre-processing. All features are transformed to have zero mean and unit variance. We randomly split the data into train and test sets in each run. The CT Position dataset is from CT images of patients (Graf et al., 2011), with 385 features and the target set to the relative location of the image.
Researcher Affiliation Academia Ehsan Imani 1 Martha White 1 1Department of Computing Science, University of Alberta, Edmonton. Correspondence to: Martha White <whitem@ualberta.ca>.
Pseudocode No The paper describes the methods in text and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes The CT Position dataset is from CT images of patients (Graf et al., 2011), The Song Year dataset is a subset of The Million Song Dataset (Bertin-Mahieux et al., 2011), The Bike Sharing dataset (Fanaee-T & Gama, 2014)
Dataset Splits No The paper states "We randomly split the data into train and test sets in each run" but does not explicitly provide details for a validation split (percentages, sample counts, or specific methodology).
Hardware Specification No The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions "Scikit-learn (Pedregosa et al., 2011)" and "Keras (Chollet et al., 2015)" and "Adam optimizer (Kingma & Ba, 2014)" but does not specify their version numbers.
Experiment Setup Yes All units employ Re LU activation, except the last layer with linear activations. Unless specified otherwise, all networks using HL have 100 bins. All neural network models are trained with mini-batch size 256 using the Adam optimizer (Kingma & Ba, 2014) with a learning rate 1e-3 and the parameters are initialized according to the method suggested by Le Cun et al. (1998). Dropout (Srivastava et al., 2014) with rate 0.05 is added to the input layer of all neural networks to avoid overfitting. We trained the networks for 1000 epochs on CT Position, 150 epochs on Song Year and 500 epochs on Bike Sharing.