reproducibilityindex.ai

LambdaNetworks: Modeling long-range Interactions without Attention

Authors: Irwan Bello

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Lambda Networks on computer vision tasks where works using self-attention are hindered by large memory costs (Wang et al., 2018; Bello et al., 2019), suffer impractical implementations (Ramachandran et al., 2019), or require vast amounts of data (Dosovitskiy et al., 2020). In our experiments spanning Image Net classiﬁcation, COCO object detection and instance segmentation, Lambda Networks signiﬁcantly outperform their convolutional and attentional counterparts, while being more computationally efﬁcient and faster than the latter.
Researcher Affiliation	Industry	Irwan Bello Google Research, Brain team ibello@google.com
Pseudocode	Yes	Figure 3: Pseudo-code for the multi-query lambda layer. The position embeddings can be made to satisfy various conditions, such as translation equivariance, when computing positional lambdas (not shown). The lambda layer can be adapted to other tasks/modalities by adjusting the choice of embeddings (Section A.2).
Open Source Code	No	The paper mentions 'We refer the reader to the code for more details' but does not provide a specific link or explicit statement about open-source code availability in the main text or supplementary materials.
Open Datasets	Yes	We evaluate lambda layers on standard computer vision benchmarks: Image Net classiﬁcation (Deng et al., 2009), COCO object detection and instance segmentation (Lin et al., 2014).
Dataset Splits	Yes	We tuned our models using a held-out validation set comprising 2% of the Image Net training set (20 shards out of 1024). We perform early stopping on the held-out validation set for the largest models, starting with Lambda Res Net-350 at resolution 288x288, and simply report the ﬁnal accuracies for the smaller models.
Hardware Specification	Yes	Lambda Res Nets reach excellent accuracies on Image Net while being 3.2 4.4x faster than the popular Efﬁcient Nets on modern machine learning accelerators. [...] Figure 4 presents the speed-accuracy Pareto curve of Lambda Res Nets compared to Efﬁcient Nets (Tan & Le, 2019) on TPUv3 hardware. [...] Training and inference throughput is shown for 8 TPUv3 cores.
Software Dependencies	No	In our experiments using Tensorﬂow 1.x on TPUv3 hardware, we found both the n-d depthwise and (n+1)-d convolution implementations to have similar speed.
Experiment Setup	Yes	We consider two training setups for the Image Net classiﬁcation task. The 90 epochs training setup trains models for 90 epochs via backpropagation using SGD with momentum 0.9. The batch size B is 4096 distributed across 32 TPUv3 cores and the weight decay is set to 1e-4. The learning rate is scaled linearly from 0 to 0.1B/256 for 5 epochs and then decayed using the cosine schedule. We use batch normalization with decay 0.9999 and exponential moving average with weight 0.9999 over trainable parameters and a label smoothing of 0.1. The input image size is set to 224x224. We use standard training data augmentation (random crops and horizontal ﬂip with 50% probability).