Tree Cross Attention
Authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that Tree Cross Attention (TCA) performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient. Furthermore, we compare Re Treever against Perceiver IO, showing significant gains while using the same number of tokens for inference. |
| Researcher Affiliation | Collaboration | Leo Feng Mila Universit e de Montr eal & Borealis AI leo.feng@mila.quebec; Frederick Tung Borealis AI frederick.tung@borealisai.com; Hossein Hajimirsadeghi Borealis AI hossein.hajimirsadeghi@borealisai.com; Yoshua Bengio Mila Universit e de Montr eal yoshua.bengio@mila.quebec; Mohamed Osama Ahmed Borealis AI mohamed.o.ahmed@borealisai.com |
| Pseudocode | Yes | Algorithm 1 Retrieval |
| Open Source Code | Yes | The code is available at https://github.com/Borealis AI/tree-cross-attention |
| Open Datasets | Yes | We evaluate Re Treever on popular uncertainty estimation settings used in (Conditional) Neural Processes literature and which have been benchmarked extensively (Table 13 in Appendix) (Garnelo et al., 2018a;b; Kim et al., 2019; Lee et al., 2020; Nguyen & Grover, 2022; Feng et al., 2023a;b). For the Human Activity dataset, we use the official repository for m TAN (Multi-Time Attention Networks) https://github.com/reml-lab/m TAN. The Human Activity dataset is available in the same link. |
| Dataset Splits | No | The paper describes the data generation and evaluation procedures for tasks like GP Regression and Image Completion, and mentions evaluation on test sets (e.g., 'evaluated on its accuracy on 3200 randomly generated test sequences' for Copy Task). However, it does not explicitly state specific train/validation/test dataset splits (e.g., as percentages or absolute sample counts) or references to predefined validation splits. |
| Hardware Specification | Yes | Our experiments were run using a mix of Nvidia GTX 1080 Ti (12 GB) or Nvidia Tesla P100 (16 GB) GPUs. |
| Software Dependencies | No | The paper mentions using official repositories for TNPs and LBANPs (which imply specific software like PyTorch based on their names) and an ADAM optimizer, but it does not specify version numbers for Python, PyTorch, CUDA, or other key software components required for replication. |
| Experiment Setup | Yes | Re Treever uses 6 layers in the encoder for all experiments. Our aggregator function is a Self Attention (Transformer) module whose output is averaged. ... We found that simply setting λRL = λCA = 1.0 and α = 0.01 was set for all tasks. We used an ADAM optimizer with a standard learning rate of 5e 4. All experiments were run with 3 seeds. ... dropout was set to 0.1 for the Copy Task for all methods. |