reproducibilityindex.ai

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Authors: Yan-Bo Lin, Yu-Chiang Frank Wang2056-2063

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on benchmark datasets conﬁrm the effectiveness of our proposed framework in both semi-supervised and fully supervised scenarios, with ablation studies and visualization further support the use of our model for audio spatialization.
Researcher Affiliation	Collaboration	Yan-Bo Lin 1 and Yu-Chiang Frank Wang1,2 1Graduate Inst. Communication Engineering, National Taiwan University, Taiwan 2ASUS Intelligent Cloud Services, Taiwan
Pseudocode	No	The paper describes its method using mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for its methodology.
Open Datasets	Yes	FAIR-PLAY (Gao and Grauman 2019a). The FAIR-PLAY dataset consists of 1,871 10s clips of videos with binaural recording. REC-STREET (Pedro Morgado and Wang 2018). YT-CLEAN (Pedro Morgado and Wang 2018). YT-MUSIC (Pedro Morgado and Wang 2018).
Dataset Splits	Yes	As for the train/val/test split, we follow up given splits from FAIR-PLAY dataset.
Hardware Specification	Yes	We implement our model using Py Torch (Paszke et al. 2019) and train our model on a single NVIDIA GTX 1080 Ti GPU with 12 GB memory.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al. 2019)', but does not specify a version number for PyTorch or other software components used.
Experiment Setup	Yes	As for audio settings in our experiments, the raw audio data are resampled at 16k HZ. As for the STFT setting, we use a Hann window of length 25ms, FFT size of 512 and hop length of 10ms. During training, we randomly sample one audio segment with 0.63s in a video with the corresponding video frame. As for testing, we sample all the audio segments in a video with 0.05s hop size.