reproducibilityindex.ai

Segment beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Authors: Renjie Wu, Hu Wang, Feras Dayoub, Hsiang-Ting Chen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying Fo V ranges and in monaural audio settings. Adapting the Omni Auditory Perception Dataset (Dai et al. 2022; Vasudevan, Dai, and Van Gool 2020) to the proposed task, the results suggest that our method outperforms state-of-the-art audio-visual semantic segmentation methods (Zhou et al. 2022, 2023) and maintain consistent performance across different Fo V ranges and in monaural audio environments. Demonstrating the superior performance of SBV through comparison with state-of-the-art models and presenting ablation studies examining various degrees of partially missing modality and model architectures.
Researcher Affiliation	Academia	The University of Adelaide {renjie.wu, hu.wang, feras.dayoub, tim.chen}@adelaide.edu.au
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the model architecture and training process in text and diagrams, but without pseudocode formatting.
Open Source Code	No	The paper does not provide concrete access to source code, such as a specific repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets	Yes	Adapting the Omni Auditory Perception Dataset (Dai et al. 2022; Vasudevan, Dai, and Van Gool 2020) to the proposed task
Dataset Splits	Yes	In addition to the normal training dataset (51, 400) and validation dataset (6, 208), it contains two test datasets: Auditory Test Pseudo dataset (6, 492) and Auditory Test Manual dataset.
Hardware Specification	Yes	We train models by using NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions various software components and models (e.g., Adam, Open CV, Seg Former, Sound Net, Res Net50, Deep Labv3+), and provides citations for them, but does not specify their version numbers (e.g., Open CV version 4.x, PyTorch version X.Y).
Experiment Setup	Yes	We use Adam (Kingma and Ba 2014) and set learning rate as 1 10 5 for the optimizer. We use one cycle policy (Smith and Topin 2019) as our learning rate decay strategy. All images are resized to 480 480. The spectrogram size is set as 257 601. All student models are trained for 50 epochs to ensure that the loss converges. For the Eqn. 7, we set βa = 0.1 and βv = 0.4 for logits distillation; about the feature distillation part, we set all λ = 0.05 and all γ = 0.02.