LambdaNetworks: Modeling long-range Interactions without Attention
Authors: Irwan Bello
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Lambda Networks on computer vision tasks where works using self-attention are hindered by large memory costs (Wang et al., 2018; Bello et al., 2019), suffer impractical implementations (Ramachandran et al., 2019), or require vast amounts of data (Dosovitskiy et al., 2020). In our experiments spanning Image Net classification, COCO object detection and instance segmentation, Lambda Networks significantly outperform their convolutional and attentional counterparts, while being more computationally efficient and faster than the latter. |
| Researcher Affiliation | Industry | Irwan Bello Google Research, Brain team ibello@google.com |
| Pseudocode | Yes | Figure 3: Pseudo-code for the multi-query lambda layer. The position embeddings can be made to satisfy various conditions, such as translation equivariance, when computing positional lambdas (not shown). The lambda layer can be adapted to other tasks/modalities by adjusting the choice of embeddings (Section A.2). |
| Open Source Code | No | The paper mentions 'We refer the reader to the code for more details' but does not provide a specific link or explicit statement about open-source code availability in the main text or supplementary materials. |
| Open Datasets | Yes | We evaluate lambda layers on standard computer vision benchmarks: Image Net classification (Deng et al., 2009), COCO object detection and instance segmentation (Lin et al., 2014). |
| Dataset Splits | Yes | We tuned our models using a held-out validation set comprising 2% of the Image Net training set (20 shards out of 1024). We perform early stopping on the held-out validation set for the largest models, starting with Lambda Res Net-350 at resolution 288x288, and simply report the final accuracies for the smaller models. |
| Hardware Specification | Yes | Lambda Res Nets reach excellent accuracies on Image Net while being 3.2 4.4x faster than the popular Efficient Nets on modern machine learning accelerators. [...] Figure 4 presents the speed-accuracy Pareto curve of Lambda Res Nets compared to Efficient Nets (Tan & Le, 2019) on TPUv3 hardware. [...] Training and inference throughput is shown for 8 TPUv3 cores. |
| Software Dependencies | No | In our experiments using Tensorflow 1.x on TPUv3 hardware, we found both the n-d depthwise and (n+1)-d convolution implementations to have similar speed. |
| Experiment Setup | Yes | We consider two training setups for the Image Net classification task. The 90 epochs training setup trains models for 90 epochs via backpropagation using SGD with momentum 0.9. The batch size B is 4096 distributed across 32 TPUv3 cores and the weight decay is set to 1e-4. The learning rate is scaled linearly from 0 to 0.1B/256 for 5 epochs and then decayed using the cosine schedule. We use batch normalization with decay 0.9999 and exponential moving average with weight 0.9999 over trainable parameters and a label smoothing of 0.1. The input image size is set to 224x224. We use standard training data augmentation (random crops and horizontal flip with 50% probability). |