reproducibilityindex.ai

Deep Neural Room Acoustics Primitive

Authors: Yuhang He, Anoop Cherian, Gordon Wichern, Andrew Markham

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments on both synthetic and realworld datasets, demonstrating superior quality in RIR estimation against closely related methods. To empirically validate the superiority of Deep Ne RAP, we conduct experiments on both synthetic and real-world datasets.
Researcher Affiliation	Collaboration	Yuhang He 1 Anoop Cherian 2 Gordon Wichern 2 Andrew Markham 1 ... 1Department of Computer Science, University of Oxford, Oxford, UK 2Mitsubishi Electric Research Labs, Cambridge, MA, US.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The detailed network architecture is illustrated in Sec. A.5 in Appendix and the source code is given in supplementary material.
Open Datasets	Yes	For the former [synthetic dataset], we use the large scale Sound Spaces 2.0 dataset (Chen et al., 2022; Chang et al., 2017) consisting of indoor scenes with an average room area ą 100m2 and enriched with room acoustics. For the latter [real-world dataset], we use the real-world Mesh RIR dataset (Koyama et al., 2021).
Dataset Splits	No	The paper specifies train/test splits for both synthetic (3000/1000) and real-world (10k/4k) datasets, but does not explicitly mention a validation split.
Hardware Specification	Yes	We train Deep Ne RAP on A40 GPU with Adam optimizer (Kingma & Ba, 2015) with an initial learning rate 0.0005 but decays at every 50 epochs with decaying rate 0.5.
Software Dependencies	No	The paper mentions 'We implement Deep Ne RAP in Pytorch' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	We implement Deep Ne RAP in Pytorch. The detailed network architecture is illustrated in Sec. A.5 in Appendix and the source code is given in supplementary material. We train Deep Ne RAP on A40 GPU with Adam optimizer (Kingma & Ba, 2015) with an initial learning rate 0.0005 but decays at every 50 epochs with decaying rate 0.5. We train all models for 300 epochs. The learnable room acoustic representation M consists of 500 ˆ 500 ˆ 2, which means the grid number is 500 ˆ 500, each entry associates with a learnable feature of size 2. The scale number L 256 and scale resolution r 2.1. The aggregated room acoustic feature representation for one position is 512. The primitive encoder E network consists of 6 multi-layer perceptron (MLP) layer, each of which consists of a fully-connected layer, batch normalization layer and a Re LU activation layer. Each MLP layer s hidden unit number is 512.