reproducibilityindex.ai

ACDC: A Structured Efficient Linear Layer

Authors: Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that it can indeed be successfully interleaved with Re LU modules in convolutional neural networks for image recognition. Our experiments also study critical factors in the training of these structured modules, including initialization and depth.
Researcher Affiliation	Collaboration	1University of Oxford 2NVIDIA 3CIFAR
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It shows derivative equations in Section 4, but these are not presented as an algorithm.
Open Source Code	Yes	Torch implementation of ACDC is available at https://github.com/mdenil/acdc-torch
Open Datasets	Yes	In particular we use the Caffe Net architecture4 for Image Net (Deng et al., 2009).
Dataset Splits	Yes	In particular we use the Caffe Net architecture4 for Image Net (Deng et al., 2009). While specific percentages are not given, ImageNet is a standard benchmark with well-defined train/validation splits, implying their usage.
Hardware Specification	Yes	The processor used to benchmark the ACDC layer was an NVIDIA Titan X.
Software Dependencies	No	The paper mentions "The NVIDIA library cu FFT" but does not provide a specific version number for it or any other software dependencies.
Experiment Setup	Yes	The model was trained using the SGD algorithm with learning rate 0.1 multiplied by 0.1 every 100,000 iterations, momentum 0.65 and weight decay 0.0005. The output from the last convolutional layer was scaled by 0.1, and the learning rates for each matrix A and D were multiplied by 24 and 12. All diagonal matrices were initialized from N(1, 0.061) distribution. No weight decay was applied to A or D. Additive biases were added to the matrices D, but not to A, as this sufﬁced to provide the ACDC layer with a bias terms just before the Re LU non-linearities. Biases were initialized to 0. To prevent the model from overﬁtting dropout regularization was placed before each of the last 5 SELL layers with dropout probability equal to 0.1.