reproducibilityindex.ai

Regularising Non-linear Models Using Feature Side-information

Authors: Amina Mollaysa, Pablo Strasser, Alexandros Kalousis

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments on a number of benchmark datasets which show signiﬁcant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information.
Researcher Affiliation	Academia	1University of Applied Sciences, Western Switzerland; University of Geneva.
Pseudocode	No	The paper describes methods analytically and stochastically, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not provide any information or link indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluated both approaches on the eight document classiﬁcation datasets used in (Kusner et al., 2015). As feature side-information we use the word2vec representation of the words which have a dimensionality of 300 (Mikolov et al., 2013).
Dataset Splits	Yes	We used early stopping where we keep 20% of the training data as the validation set. For those datasets without a predeﬁned train/test split (BBCsport, Twitter, Classic, Amazon, Recipe), we use ﬁve-fold cross validation and report the average error.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., GPU/CPU models, memory details).
Software Dependencies	No	The paper mentions using Adam (Kingma & Ba, 2014) and word2vec (Mikolov et al., 2013) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We used α = 0.001, β1 = 0.9, β2 = 0.999 for one hidden layer networks, and α = 0.0001 for the networks with more hidden layers. We initialize all networks parameters using (Glorot & Bengio, 2010). For the analytical model we set the maximum number of iterations to 5000. For the stochastic model we set the maximum number of iterations to 10000 for the one layer networks and to 20000 for networks with more layers. We used early stopping where we keep 20% of the training data as the validation set. We select the λ hyperparameters of of AN, ST, and ℓ2 from {10k\|k = 3, . . . , 3}; we select the λ of dropout from [0.1, 0.2, 0.3, 0.4, 0.5]. We set the c in the augmentation process, that controls the size of the neigborhood within which the output constraints should hold, to one. For the analytical model we set the mini-batch size m to ﬁve. For the stochastic model, as well as for all the baseline models, we set the mini-batch size to 20. In the experiments we set p = 5