Improving Regression Performance with Distributional Losses
Authors: Ehsan Imani, Martha White
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we investigate the utility of the HL-Gaussian for regression, compared to using an 2 loss. We particularly investigate why the modification to this distributional loss improves performance, designing experiments to test if it is due to (a) the utility of learning distributions or smoothed targets, (b) a bias-variance trade-off from bin size or variance in the HL-Gaussian, (c) an improved representation, (d) nonlinearity introduced by the HL and (e) improved optimization properties of the loss. Datasets and pre-processing. All features are transformed to have zero mean and unit variance. We randomly split the data into train and test sets in each run. The CT Position dataset is from CT images of patients (Graf et al., 2011), with 385 features and the target set to the relative location of the image. |
| Researcher Affiliation | Academia | Ehsan Imani 1 Martha White 1 1Department of Computing Science, University of Alberta, Edmonton. Correspondence to: Martha White <whitem@ualberta.ca>. |
| Pseudocode | No | The paper describes the methods in text and mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | Yes | The CT Position dataset is from CT images of patients (Graf et al., 2011), The Song Year dataset is a subset of The Million Song Dataset (Bertin-Mahieux et al., 2011), The Bike Sharing dataset (Fanaee-T & Gama, 2014) |
| Dataset Splits | No | The paper states "We randomly split the data into train and test sets in each run" but does not explicitly provide details for a validation split (percentages, sample counts, or specific methodology). |
| Hardware Specification | No | The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Scikit-learn (Pedregosa et al., 2011)" and "Keras (Chollet et al., 2015)" and "Adam optimizer (Kingma & Ba, 2014)" but does not specify their version numbers. |
| Experiment Setup | Yes | All units employ Re LU activation, except the last layer with linear activations. Unless specified otherwise, all networks using HL have 100 bins. All neural network models are trained with mini-batch size 256 using the Adam optimizer (Kingma & Ba, 2014) with a learning rate 1e-3 and the parameters are initialized according to the method suggested by Le Cun et al. (1998). Dropout (Srivastava et al., 2014) with rate 0.05 is added to the input layer of all neural networks to avoid overfitting. We trained the networks for 1000 epochs on CT Position, 150 epochs on Song Year and 500 epochs on Bike Sharing. |