Learning Neural Acoustic Fields

Authors: Andrew Luo, Yilun Du, Michael Tarr, Josh Tenenbaum, Antonio Torralba, Chuang Gan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate that our model can faithfully represent the acoustic impulse response at seen and unseen locations. Additional ablation studies verify the importance of utilizing local geometric features to enable test time generation fidelity. Next, we demonstrate that learning acoustic fields could facilitate improved visual representations when training images are sparse. Finally we show that the learned NAF can be used to infer scene structure.
Researcher Affiliation Collaboration Andrew Luo Carnegie Mellon University Yilun Du Massachusetts Institute of Technology Michael J. Tarr Carnegie Mellon University Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL Antonio Torralba Massachusetts Institute of Technology Chuang Gan UMass Amherst and MIT-IBM Watson AI Lab
Pseudocode No The paper describes algorithms and methods using text and diagrams (e.g., Figure 2) but does not include any formally labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Project site: https://www.andrew.cmu.edu/user/afluo/Neural_Acoustic_Fields
Open Datasets Yes For evaluating the learned acoustic fields, we use two different datasets: Soundspaces. Soundspaces [Chen et al., 2020, Straub et al., 2019] is a synthetic dataset generated via ray-tracing... Mesh RIR. The Mesh RIR dataset [Koyama et al., 2021] is recorded from a real scene...
Dataset Splits No The paper states: 'For each scene, we holdout 10% of the RIRs randomly as a test set.' This specifies the test split percentage, but does not explicitly detail the training or validation splits, or how the remaining data is partitioned or used for validation, which is needed for full reproducibility of all splits.
Hardware Specification Yes Each scene is trained for 200 epochs, which takes around 6 hours for the largest scenes on four Nvidia V100s.
Software Dependencies No The paper does not provide specific software names with version numbers for ancillary software components (e.g., deep learning frameworks like PyTorch or TensorFlow, along with their versions).
Experiment Setup Yes In each batch, we sample 20 impulse responses, and randomly select 2,000 frequency & time pairs within each spectrogram. An initial learning rate of 5e-4 is used for the network and the grid features. We add a small amount of noise sampled from N(0, 0.1) to each coordinate during training to prevent degenerate solutions. We utilize 64 coarse samples and 128 fine samples for each ray, and sample 1024 rays per batch.