Is Attention All That NeRF Needs?

Authors: Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to compare GNT against state-of-the-art methods for novel view synthesis. Our experiment settings include both per-scene optimization and cross-scene generalization.
Researcher Affiliation Collaboration 1Indian Institute of Technology Madras, 2University of Texas at Austin, 3Google Research
Pseudocode Yes We provide a simple and efficient pytorch pseudo-code to implement the attention operations in the view, ray transformer blocks in Alg. 1, 2 respectively. We do not indicate the feedforward and layer normalization operations for simplicity. As seen in Alg. 3, we reuse the epipolar view features Xj to derive keys, and values across view transformer blocks.
Open Source Code No Please refer to our project page for video results: https://vita-group.github.io/GNT/. This link is specified for 'video results' and there is no explicit statement or other link for the source code of the methodology.
Open Datasets Yes Local Light Field Fusion (LLFF) dataset: Introduced by Mildenhall et al. (2019), it consists of 8 forward facing captures of real-world scenes using a smartphone.
Dataset Splits Yes In these experiments, we use the same resolution and train/test splits as Ne RF (Mildenhall et al., 2020).
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions using the Adam optimizer and provides 'Py Torch-like Pseudocode', but does not specify version numbers for any software dependencies like PyTorch or CUDA.
Experiment Setup Yes The base learning rates for the feature extraction network and GNT are 10 3 and 5 10 4 respectively, which decay exponentially over training steps. For all our experiments, we train for 250,000 steps with 4096 rays sampled in each iteration. Unlike most Ne RF methods, we do not use separate coarse, fine networks and therefore to bring GNT to a comparable experimental setup, we sample 192 coarse points per ray across all experiments (unless otherwise specified).