Acoustic Volume Rendering for Neural Impulse Response Fields

Authors: Zitong Lan, Chenhao Zheng, Zhiwei Zheng, Mingmin Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that AVR surpasses current leading methods by a substantial margin. Additionally, we develop an acoustic simulation platform, Acousti X, which provides more accurate and realistic IR simulations than existing simulators.
Researcher Affiliation Academia Zitong Lan1 Chenhao Zheng2 Zhiwei Zheng1 Mingmin Zhao1 1University of Pennsylvania 2University of Washington
Pseudocode No The paper describes its methods and processes in narrative text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code for AVR and Acousti X are available at https://zitonglan.github.io/avr.
Open Datasets Yes We evaluate our model s performance on the datasets collected from real scenes. We adopt two commonly used room impulse response datasets: Mesh RIR [20] and Real Acoustic Field [10]. We use our simulation platform to simulate monaural impulse responses in three rooms and evaluate all methods performance (Tab. 2). We also include two complicated 3D rooms from i Gibson dataset [24, 56].
Dataset Splits No We use 90% of the data to train and the rest 10% for testing. The paper mentions a total loss including a multiresolution STFT loss Lstft [58] and an energy loss Lenergy similar in [30], but it does not specify a separate validation dataset split.
Hardware Specification Yes The optimization process takes 24 hours on a single NVIDIA L40 GPU.
Software Dependencies No Acousti X uses Sionna ray tracing engine [16]. The paper does not provide version numbers for Sionna or any other software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup Yes The sampling numbers used in the experiments are Nθ = 80, Nϕ = 40, and Nr = 64. We set the weights of loss components to be λamp =λphase =0.5, λtime =100, λstft =1, λenergy =5. We train our model for 200 epochs for each scene. We use Adam optimizer with a cosine learning rate scheduler that starts at a learning rate 10 3 and decays to 10 4.