AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the advantages of our method on this real-world dataset and the simulation-based Sound Spaces dataset. We recommend that readers visit our project page for convincing comparisons: https://liangsusan-git.github.io/project/avnerf/.
Researcher Affiliation Collaboration 1University of Rochester 2Meta Reality Labs Research
Pseudocode No The paper illustrates the pipeline of the method with figures (Figure 2, 3, 4) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper states, "We recommend that readers visit our project page for convincing comparisons: https://liangsusan-git.github.io/project/avnerf/." and "We will release this dataset to the research community." However, it does not explicitly state that the source code for the methodology will be released or provide a direct link to a code repository.
Open Datasets Yes To facilitate our study, we curated a high-quality audio-visual dataset called RWAVS (Real-World Audio-Visual Synthesis), which encompasses multimodal data, including camera poses, video frames, and realistic binaural (stereo) audios. ... We will release this dataset to the research community. ... Additionally, we utilize (synthetic) Sound Spaces dataset [4] to validate our method.
Dataset Splits Yes We split 80% data as training samples and the rest as validation samples. ... We maintain the same training/test split as NAF, allocating 90% data for training and 10% data for testing.
Hardware Specification No The paper states, "We implement our method using the Py Torch framework [59]," but does not provide specific hardware details such as GPU models, CPU types, or cloud computing specifications used for running experiments.
Software Dependencies No The paper mentions software like "Py Torch framework [59]", "COLMAP [46]", "Adobe Audition [47]", "Habitat-Sim simulator [48, 49]", and "nerf-studio [20]", but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup Yes We employ Adam optimizer [60] with β1 = 0.9 and β2 = 0.999 for model optimization. The initial learning rate is set to 5e 4 and exponentially decreased to 5e 6. We train the model for 100 epochs with a batch size of 32.