GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers
Authors: Takeru Miyato, Bernhard Jaeger, Max Welling, Andreas Geiger
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead. |
| Researcher Affiliation | Academia | 1 University of T ubingen, T ubingen AI Center 2 University of Amsterdam |
| Pseudocode | Yes | Algorithm 1 provides an algorithmic description based on Eq. (6) for single-head self-attention. |
| Open Source Code | Yes | Code: https://github.com/autonomousvision/gta. |
| Open Datasets | Yes | We evaluate our method on two synthetic 360 datasets with sparse and wide baseline views (CLEVR-TR and MSN-Hard) and on two datasets of real scenes with distant views (Real Estate10k and ACID). |
| Dataset Splits | Yes | Fig. 11: Mean and standard deviation plots of validation PSNRs on CLEVR-TR and MSN-Hard. |
| Hardware Specification | Yes | We train with 4 RTX 2080 Ti GPUs on CLEVR-TR and with 4 Nvidia A100 GPUs on the other datasets. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and normalization techniques (RMSNorm) and uses bfloat16 and float32 precision, but does not specify versions for major software frameworks (e.g., PyTorch, TensorFlow) or other libraries. |
| Experiment Setup | Yes | Table 15 shows dataset properties and hyperparameters that we use in our experiments. We train each model for 2M and 4M iterations on CLEVR-TR and MSH-Hard and for 300K iterations on both Real Estate10k and ACID, respectively. Learning rate 1e-4. Batch size 32. |