Sound Adversarial Audio-Visual Navigation

Authors: Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two real-world 3D scan datasets (Replica and Matterport3D) verify the effectiveness and the robustness of the agent trained under our designed environment when transferred to the clean environment or the one containing sound attackers with random policy. and 4 EXPERIMENTS
Researcher Affiliation Collaboration 1 Beijing National Research Center for Information Science and Technology (BNRist), State Key Lab on Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University 2 Institute for AI Industry Research (AIR), Tsinghua University 3 College of Information Science and Engineering, Xinjiang University 4 UT Austin 5 JD Explore Academy, JD.com
Pseudocode Yes Algorithm 1: Sound Adversarial Audio-Visual Navigation
Open Source Code Yes Our code is available at : appx. L and https://yyf17.github.io/SAAVN/tree/main.
Open Datasets Yes Our work is based on the Sound Spaces (Chen et al., 2020) platform and Habitat simulator (Savva et al., 2019) and with the publicly available datasets: Replica (Straub et al., 2019) and Matterport3D (Chang et al., 2017) and Sound Spaces audio dataset. ... Please refer to https://github.com/yyf17/SAAVN/blob/main/dataset.md for the detailed steps of downloading and processing the dataset.
Dataset Splits No The paper mentions '# Training Episodes' and '# Test Episodes' in Table 3 but does not provide explicit training, validation, and test dataset splits with percentages or sample counts for reproduction.
Hardware Specification No The paper mentions 'GPU hours' but does not provide specific details on the hardware used, such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions software components like PyTorch, Adam, PPO, Mind Spore, CANN, and Ascend AI Processor, but does not provide specific version numbers for any of these to ensure reproducibility.
Experiment Setup Yes We train our model with Adam with a learning rate of 2.5 10 4. The auditory and visual encoder output are 512 and 512, respectively. We use a one-layer bidirectional GRU with 512 hidden units... We use an entropy loss on the policy distribution with a coefficient of 0.01. We train the network for 30M agent steps on Replica and 60M on Matterport3D... Table 4: Algorithm parameters includes: clip param 0.1, ppo epoch 4, num mini batch 1, value loss coef 0.5, entropy coef 0.02, learning rate 2.5 10 4, max grad norm 0.5, num steps 150, γ 0.99, τ 0.95, β 0.01, reward window size 50, success reward 10.0, salck reward -0.01, distance reward scale 1.0, hidden size 512, w1 1/6, w2 1/6, w3 1/6, w4 1/2.