reproducibilityindex.ai

Improving Regression Performance with Distributional Losses

Authors: Ehsan Imani, Martha White

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we investigate the utility of the HL-Gaussian for regression, compared to using an 2 loss. We particularly investigate why the modiﬁcation to this distributional loss improves performance, designing experiments to test if it is due to (a) the utility of learning distributions or smoothed targets, (b) a bias-variance trade-off from bin size or variance in the HL-Gaussian, (c) an improved representation, (d) nonlinearity introduced by the HL and (e) improved optimization properties of the loss. Datasets and pre-processing. All features are transformed to have zero mean and unit variance. We randomly split the data into train and test sets in each run. The CT Position dataset is from CT images of patients (Graf et al., 2011), with 385 features and the target set to the relative location of the image.
Researcher Affiliation	Academia	Ehsan Imani 1 Martha White 1 1Department of Computing Science, University of Alberta, Edmonton. Correspondence to: Martha White <whitem@ualberta.ca>.
Pseudocode	No	The paper describes the methods in text and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	Yes	The CT Position dataset is from CT images of patients (Graf et al., 2011), The Song Year dataset is a subset of The Million Song Dataset (Bertin-Mahieux et al., 2011), The Bike Sharing dataset (Fanaee-T & Gama, 2014)
Dataset Splits	No	The paper states "We randomly split the data into train and test sets in each run" but does not explicitly provide details for a validation split (percentages, sample counts, or specific methodology).
Hardware Specification	No	The paper does not explicitly describe the hardware specifications (e.g., specific GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions "Scikit-learn (Pedregosa et al., 2011)" and "Keras (Chollet et al., 2015)" and "Adam optimizer (Kingma & Ba, 2014)" but does not specify their version numbers.
Experiment Setup	Yes	All units employ Re LU activation, except the last layer with linear activations. Unless speciﬁed otherwise, all networks using HL have 100 bins. All neural network models are trained with mini-batch size 256 using the Adam optimizer (Kingma & Ba, 2014) with a learning rate 1e-3 and the parameters are initialized according to the method suggested by Le Cun et al. (1998). Dropout (Srivastava et al., 2014) with rate 0.05 is added to the input layer of all neural networks to avoid overﬁtting. We trained the networks for 1000 epochs on CT Position, 150 epochs on Song Year and 500 epochs on Bike Sharing.