INRAS: Implicit Neural Representation for Audio Scenes

Authors: Kun Su, Mingfei Chen, Eli Shlizerman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that INRAS outperforms existing approaches for representation and rendering of sounds for varying emitter-listener locations in all aspects, including the impulse response quality, inference speed, and storage requirements.
Researcher Affiliation Academia Department of Electrical & Computer Engineering, University of Washington, Seattle, USA. Department of Applied Mathematics, University of Washington, Seattle, USA
Pseudocode No The paper describes the model architecture and training process in text but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The inference code is available in the supplementary material. The full code will be available in the Github after the review process.
Open Datasets Yes To evaluate our method, we use the Soundspaces dataset which consists of dense pairs of impulse responses generated by geometric sound propagation methods [54]. [54] Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, and Kristen Grauman. Soundspaces: Audio-visual navigation in 3d environments. In European Conference on Computer Vision, pages 17 36. Springer, 2020.
Dataset Splits No For each scene, we use 90% data for training and hold 10% data for testing. The paper specifies training and testing splits but does not explicitly mention a separate validation split percentage or count.
Hardware Specification Yes We use Adam W optimizer [55] to train all models on a Tesla T4 GPU for 100 epochs with a batch size of 64.
Software Dependencies No We use Pytorch to implement all INRAS models. The paper mentions software like Pytorch and pyroomacoustics but does not provide specific version numbers for reproducibility.
Experiment Setup Yes We use a fully connected layer in the Scatter module and Gather module. In the Bounce module, we use a 4-layer residual MLP. In the Listener module, we use a 6-layer residual MLP. In all MLPs, we use 256 neurons and set Pre LU as the activation function. We use Adam W optimizer [55] to train all models on a Tesla T4 GPU for 100 epochs with a batch size of 64. The initial learning rate is set as 5e-4 and it gradually decreases by a factor of 0.95.