Principled Weight Initialisation for Input-Convex Neural Networks
Authors: Pieter-Jan Hoedt, Günter Klambauer
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our principled initialisation strategy by training ICNNs on three sets of experiments. In our first experiments, we investigate the effect of initialisation on learning dynamics and generalisation in ICNNs on multiple permuted image datasets. We also include non-convex networks in these experiments to illustrate that ICNNs with our principled initialisation can be trained as well as regular networks. |
| Researcher Affiliation | Academia | Pieter-Jan Hoedt & Günter Klambauer LIT AI Lab & ELLIS Unit Linz Institute for Machine Learning Johannes Kepler University, Linz, Austria {hoedt, klambauer}@ml.jku.at |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code for figures and experiments can be found at https://github.com/ml-jku/convex-init |
| Open Datasets | Yes | Concretely, we trained fully-connected ICNNs on MNIST (Bottou et al., 1994), CIFAR10 and CIFAR100 (Krizhevsky, 2009), to which we refer as permuted image benchmarks (cf. Goodfellow et al., 2014). [...] More specifically, we train ICNNs on the Tox21 challenge data (Huang et al., 2016; Mayr et al., 2016; Klambauer et al., 2017). |
| Dataset Splits | Yes | To this end we perform a grid-search to find the hyper-parameters that attain the best accuracy after 25 epochs of training on a random validation split, for each of the four compared methods. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The learning rate for the Adam optimiser was obtained by manually tuning on the non-convex baseline. [...] Hyper-parameters were selected by a manual search on the non-convex baseline. We ended up using a fully-connected network with two hidden layers of 128 neurons and Re LU activations. The network was regularised with fifty percent dropout after every hidden layer, as well as seventy percent dropout of the inputs. |