Deep Neural Room Acoustics Primitive
Authors: Yuhang He, Anoop Cherian, Gordon Wichern, Andrew Markham
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on both synthetic and realworld datasets, demonstrating superior quality in RIR estimation against closely related methods. To empirically validate the superiority of Deep Ne RAP, we conduct experiments on both synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | Yuhang He 1 Anoop Cherian 2 Gordon Wichern 2 Andrew Markham 1 ... 1Department of Computer Science, University of Oxford, Oxford, UK 2Mitsubishi Electric Research Labs, Cambridge, MA, US. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The detailed network architecture is illustrated in Sec. A.5 in Appendix and the source code is given in supplementary material. |
| Open Datasets | Yes | For the former [synthetic dataset], we use the large scale Sound Spaces 2.0 dataset (Chen et al., 2022; Chang et al., 2017) consisting of indoor scenes with an average room area ą 100m2 and enriched with room acoustics. For the latter [real-world dataset], we use the real-world Mesh RIR dataset (Koyama et al., 2021). |
| Dataset Splits | No | The paper specifies train/test splits for both synthetic (3000/1000) and real-world (10k/4k) datasets, but does not explicitly mention a validation split. |
| Hardware Specification | Yes | We train Deep Ne RAP on A40 GPU with Adam optimizer (Kingma & Ba, 2015) with an initial learning rate 0.0005 but decays at every 50 epochs with decaying rate 0.5. |
| Software Dependencies | No | The paper mentions 'We implement Deep Ne RAP in Pytorch' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We implement Deep Ne RAP in Pytorch. The detailed network architecture is illustrated in Sec. A.5 in Appendix and the source code is given in supplementary material. We train Deep Ne RAP on A40 GPU with Adam optimizer (Kingma & Ba, 2015) with an initial learning rate 0.0005 but decays at every 50 epochs with decaying rate 0.5. We train all models for 300 epochs. The learnable room acoustic representation M consists of 500 ˆ 500 ˆ 2, which means the grid number is 500 ˆ 500, each entry associates with a learnable feature of size 2. The scale number L 256 and scale resolution r 2.1. The aggregated room acoustic feature representation for one position is 512. The primitive encoder E network consists of 6 multi-layer perceptron (MLP) layer, each of which consists of a fully-connected layer, batch normalization layer and a Re LU activation layer. Each MLP layer s hidden unit number is 512. |