Principled Weight Initialisation for Input-Convex Neural Networks

Authors: Pieter-Jan Hoedt, Günter Klambauer

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our principled initialisation strategy by training ICNNs on three sets of experiments. In our first experiments, we investigate the effect of initialisation on learning dynamics and generalisation in ICNNs on multiple permuted image datasets. We also include non-convex networks in these experiments to illustrate that ICNNs with our principled initialisation can be trained as well as regular networks.
Researcher Affiliation Academia Pieter-Jan Hoedt & Günter Klambauer LIT AI Lab & ELLIS Unit Linz Institute for Machine Learning Johannes Kepler University, Linz, Austria {hoedt, klambauer}@ml.jku.at
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code for figures and experiments can be found at https://github.com/ml-jku/convex-init
Open Datasets Yes Concretely, we trained fully-connected ICNNs on MNIST (Bottou et al., 1994), CIFAR10 and CIFAR100 (Krizhevsky, 2009), to which we refer as permuted image benchmarks (cf. Goodfellow et al., 2014). [...] More specifically, we train ICNNs on the Tox21 challenge data (Huang et al., 2016; Mayr et al., 2016; Klambauer et al., 2017).
Dataset Splits Yes To this end we perform a grid-search to find the hyper-parameters that attain the best accuracy after 25 epochs of training on a random validation split, for each of the four compared methods.
Hardware Specification No The paper does not explicitly state the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes The learning rate for the Adam optimiser was obtained by manually tuning on the non-convex baseline. [...] Hyper-parameters were selected by a manual search on the non-convex baseline. We ended up using a fully-connected network with two hidden layers of 128 neurons and Re LU activations. The network was regularised with fifty percent dropout after every hidden layer, as well as seventy percent dropout of the inputs.