reproducibilityindex.ai

Compositional Attention: Disentangling Search and Retrieval

Authors: Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings.
Researcher Affiliation	Academia	Sarthak Mittal , Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie Mila, Universit e de Montr eal
Pseudocode	No	The paper describes the mechanism using mathematical equations and computation graphs (Figure 2) but does not include a labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	1Open-sourced implementation is available at https://github.com/sarthmit/Compositional-Attention
Open Datasets	Yes	Sort-of-CLEVR (Santoro et al., 2017) is a Visual Question-Answering (VQA) task... We perform experiments on the Wiki Text-103 data corpus (Merity et al., 2016)... We pose the problem of image classiﬁcation across four different datasets CIFAR10, Fashion MNIST, SVHN and Equilateral Triangle Detection as a multi-task learning setup.
Dataset Splits	Yes	The corpus consists of 28,475 articles in its training split and 60 in the validation and test split respectively
Hardware Specification	No	The paper mentions running experiments on 'GPUs' and discusses FLOPs, but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions using 'fairseq codebase' and 'pytorch-Op Counter' but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use a 4-layered transformer with shared parameters and ablate with transformer dimensions 32, 256 and 512 and ffn dimension as 64, 512, 1024 respectively. We consider baseline with 4 and 8 heads and for the proposed model, we use 4 searches and ablate on 1 4 retrievals. We use 32 dimensions for the retrieval query and key dimensions. We train the model with 0.0001 learning rate for 100 epochs.