reproducibilityindex.ai

Tree Cross Attention

Authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that Tree Cross Attention (TCA) performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient. Furthermore, we compare Re Treever against Perceiver IO, showing significant gains while using the same number of tokens for inference.
Researcher Affiliation	Collaboration	Leo Feng Mila Universit e de Montr eal & Borealis AI leo.feng@mila.quebec; Frederick Tung Borealis AI frederick.tung@borealisai.com; Hossein Hajimirsadeghi Borealis AI hossein.hajimirsadeghi@borealisai.com; Yoshua Bengio Mila Universit e de Montr eal yoshua.bengio@mila.quebec; Mohamed Osama Ahmed Borealis AI mohamed.o.ahmed@borealisai.com
Pseudocode	Yes	Algorithm 1 Retrieval
Open Source Code	Yes	The code is available at https://github.com/Borealis AI/tree-cross-attention
Open Datasets	Yes	We evaluate Re Treever on popular uncertainty estimation settings used in (Conditional) Neural Processes literature and which have been benchmarked extensively (Table 13 in Appendix) (Garnelo et al., 2018a;b; Kim et al., 2019; Lee et al., 2020; Nguyen & Grover, 2022; Feng et al., 2023a;b). For the Human Activity dataset, we use the official repository for m TAN (Multi-Time Attention Networks) https://github.com/reml-lab/m TAN. The Human Activity dataset is available in the same link.
Dataset Splits	No	The paper describes the data generation and evaluation procedures for tasks like GP Regression and Image Completion, and mentions evaluation on test sets (e.g., 'evaluated on its accuracy on 3200 randomly generated test sequences' for Copy Task). However, it does not explicitly state specific train/validation/test dataset splits (e.g., as percentages or absolute sample counts) or references to predefined validation splits.
Hardware Specification	Yes	Our experiments were run using a mix of Nvidia GTX 1080 Ti (12 GB) or Nvidia Tesla P100 (16 GB) GPUs.
Software Dependencies	No	The paper mentions using official repositories for TNPs and LBANPs (which imply specific software like PyTorch based on their names) and an ADAM optimizer, but it does not specify version numbers for Python, PyTorch, CUDA, or other key software components required for replication.
Experiment Setup	Yes	Re Treever uses 6 layers in the encoder for all experiments. Our aggregator function is a Self Attention (Transformer) module whose output is averaged. ... We found that simply setting λRL = λCA = 1.0 and α = 0.01 was set for all tasks. We used an ADAM optimizer with a standard learning rate of 5e 4. All experiments were run with 3 seeds. ... dropout was set to 0.1 for the Copy Task for all methods.