reproducibilityindex.ai

Multiplication-Free Transformer Training via Piecewise Affine Operations

Authors: Atli Kosson, Martin Jaggi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that transformers can be trained with piecewise affine matrix multiplications on both vision and language data with little to no performance impact. We compare this to Adder Net based transformers [30] demonstrating better accuracy while replacing more multiplications.
Researcher Affiliation	Academia	Atli Kosson Martin Jaggi EPFL, Switzerland firstname.lastname@epfl.ch
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	We publicly release our code2, including custom kernels, in the hopes of aiding further research into multiplication-free neural networks. 2Code available at https://github.com/epfml/piecewise-affine-multiplication
Open Datasets	Yes	The first one is German to English translation on the IWSLT14 DE-EN dataset [3]... We train on either CIFAR10 [19] or the Image Net-1k [6] dataset.
Dataset Splits	Yes	CIFAR10 consists of 50K training and 10K test images of size 32 32 corresponding to 10 classes.
Hardware Specification	Yes	We use Py Torch [28] for our experiments and run them using either Nvidia A100 (40GB) or V100 (32GB) GPUs.
Software Dependencies	No	The paper mentions 'Py Torch [28]', 'Fair Seq [27]', and 'Py Torch Image Models project [36]' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Our baseline setup trains for 20 epochs, using a cosine decay schedule with 4000 warmup steps and a peak learning rate 5 10 4 with a maximum batch size of 4096 tokens. We use Adam W [21, 17] for optimization with β1 = 0.9, β2 = 0.98 and weight decay of 10 4. During training we apply a dropout with drop probability 0.3 and use cross entropy with a label smoothing of 0.1.