The HSIC Bottleneck: Deep Learning without Back-Propagation
Authors: Wan-Duo Kurt Ma, J. P. Lewis, W. Bastiaan Kleijn5085-5092
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that the HSIC bottleneck provides performance on MNIST/Fashion MNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. In this section, we report several experiments that explore and validate the HSIC-trained network concept. |
| Researcher Affiliation | Collaboration | Victoria University {mawand, bastiaan.kleijn}@ecs.vuw.ac.nz, jplewis@google.com |
| Pseudocode | Yes | Algorithm 1: Unformatted-training |
| Open Source Code | Yes | Our code is available at https://github.com/choasma/HSIC-Bottleneck |
| Open Datasets | Yes | For the experiments, we used standard feedforward networks with batch-normalization (Ioffe and Szegedy 2015) on the MNIST/Fashion MNIST/CIFAR10 datasets. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits, only mentioning 'Typically the minibatch size is a constant that is chosen based on validation performance'. |
| Hardware Specification | No | The paper mentions 'available GPU memory' and 'on a GPU' in the context of HSIC complexity, but does not provide specific hardware details such as GPU models, CPU types, or other computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using 'standard feedforward networks with batch-normalization' and a 'simple SGD optimizer', but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions). |
| Experiment Setup | Yes | All experiments including standard backpropagation, unformattedtrained, and format-trained, use a simple SGD optimizer. The coefficient β and the kernel scale factor σ of the HSIC-bottleneck were set to 500 and 5 respectively, which empirically balances compression and the relevant information available for the classification task. We use a fully connected network architecture 784-256-256-256-256-256-10 with Re LU activation functions. |