Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Authors: Yan-Bo Lin, Yu-Chiang Frank Wang2056-2063
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark datasets conο¬rm the effectiveness of our proposed framework in both semi-supervised and fully supervised scenarios, with ablation studies and visualization further support the use of our model for audio spatialization. |
| Researcher Affiliation | Collaboration | Yan-Bo Lin 1 and Yu-Chiang Frank Wang1,2 1Graduate Inst. Communication Engineering, National Taiwan University, Taiwan 2ASUS Intelligent Cloud Services, Taiwan |
| Pseudocode | No | The paper describes its method using mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for its methodology. |
| Open Datasets | Yes | FAIR-PLAY (Gao and Grauman 2019a). The FAIR-PLAY dataset consists of 1,871 10s clips of videos with binaural recording. REC-STREET (Pedro Morgado and Wang 2018). YT-CLEAN (Pedro Morgado and Wang 2018). YT-MUSIC (Pedro Morgado and Wang 2018). |
| Dataset Splits | Yes | As for the train/val/test split, we follow up given splits from FAIR-PLAY dataset. |
| Hardware Specification | Yes | We implement our model using Py Torch (Paszke et al. 2019) and train our model on a single NVIDIA GTX 1080 Ti GPU with 12 GB memory. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al. 2019)', but does not specify a version number for PyTorch or other software components used. |
| Experiment Setup | Yes | As for audio settings in our experiments, the raw audio data are resampled at 16k HZ. As for the STFT setting, we use a Hann window of length 25ms, FFT size of 512 and hop length of 10ms. During training, we randomly sample one audio segment with 0.63s in a video with the corresponding video frame. As for testing, we sample all the audio segments in a video with 0.05s hop size. |