reproducibilityindex.ai

GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers

Authors: Takeru Miyato, Bernhard Jaeger, Max Welling, Andreas Geiger

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead.
Researcher Affiliation	Academia	1 University of T ubingen, T ubingen AI Center 2 University of Amsterdam
Pseudocode	Yes	Algorithm 1 provides an algorithmic description based on Eq. (6) for single-head self-attention.
Open Source Code	Yes	Code: https://github.com/autonomousvision/gta.
Open Datasets	Yes	We evaluate our method on two synthetic 360 datasets with sparse and wide baseline views (CLEVR-TR and MSN-Hard) and on two datasets of real scenes with distant views (Real Estate10k and ACID).
Dataset Splits	Yes	Fig. 11: Mean and standard deviation plots of validation PSNRs on CLEVR-TR and MSN-Hard.
Hardware Specification	Yes	We train with 4 RTX 2080 Ti GPUs on CLEVR-TR and with 4 Nvidia A100 GPUs on the other datasets.
Software Dependencies	No	The paper mentions optimizers (Adam W) and normalization techniques (RMSNorm) and uses bfloat16 and float32 precision, but does not specify versions for major software frameworks (e.g., PyTorch, TensorFlow) or other libraries.
Experiment Setup	Yes	Table 15 shows dataset properties and hyperparameters that we use in our experiments. We train each model for 2M and 4M iterations on CLEVR-TR and MSH-Hard and for 300K iterations on both Real Estate10k and ACID, respectively. Learning rate 1e-4. Batch size 32.