INRAS: Implicit Neural Representation for Audio Scenes
Authors: Kun Su, Mingfei Chen, Eli Shlizerman
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that INRAS outperforms existing approaches for representation and rendering of sounds for varying emitter-listener locations in all aspects, including the impulse response quality, inference speed, and storage requirements. |
| Researcher Affiliation | Academia | Department of Electrical & Computer Engineering, University of Washington, Seattle, USA. Department of Applied Mathematics, University of Washington, Seattle, USA |
| Pseudocode | No | The paper describes the model architecture and training process in text but does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | The inference code is available in the supplementary material. The full code will be available in the Github after the review process. |
| Open Datasets | Yes | To evaluate our method, we use the Soundspaces dataset which consists of dense pairs of impulse responses generated by geometric sound propagation methods [54]. [54] Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, and Kristen Grauman. Soundspaces: Audio-visual navigation in 3d environments. In European Conference on Computer Vision, pages 17 36. Springer, 2020. |
| Dataset Splits | No | For each scene, we use 90% data for training and hold 10% data for testing. The paper specifies training and testing splits but does not explicitly mention a separate validation split percentage or count. |
| Hardware Specification | Yes | We use Adam W optimizer [55] to train all models on a Tesla T4 GPU for 100 epochs with a batch size of 64. |
| Software Dependencies | No | We use Pytorch to implement all INRAS models. The paper mentions software like Pytorch and pyroomacoustics but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | We use a fully connected layer in the Scatter module and Gather module. In the Bounce module, we use a 4-layer residual MLP. In the Listener module, we use a 6-layer residual MLP. In all MLPs, we use 256 neurons and set Pre LU as the activation function. We use Adam W optimizer [55] to train all models on a Tesla T4 GPU for 100 epochs with a batch size of 64. The initial learning rate is set as 5e-4 and it gradually decreases by a factor of 0.95. |