Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning
Authors: Jingran Zhang, Xing Xu, Fumin Shen, Huimin Lu, Xin Liu, Heng Tao Shen3351-3359
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on both action video recognition and audio sound recognition tasks show the remarkably improved performance of the SSCL method compared with the state-of-the-art self-supervised audio-visual representation learning methods. |
| Researcher Affiliation | Academia | Jingran Zhang1, Xing Xu1*, Fumin Shen1, Huimin Lu2, Xin Liu3, Heng Tao Shen1 1Center for Future Multimedia and School of Computer Science and Engineering, University of Electronic Science and Technology of China, China 2Kyushu Institute of Technology, Japan 3Huaqiao University, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | For audio-visual pre-training, the standard dataset, Kinetics-400 (Kay et al. 2017), is exploited as an unlabeled benchmark to pre-train our model. We evaluate the visual representation f v with action recognition on the UCF-101 (Soomro, Zamir, and Shah 2012) and the HMDB-51 (Kuehne et al. 2011) datasets. Moreover, we also evaluate the audio representation f a with sound classification on the ESC-50 (Piczak 2015b) and the DCASE (Stowell et al. 2015) datasets. |
| Dataset Splits | No | The paper mentions using standard datasets like Kinetics-400, UCF-101, HMDB-51, ESC-50, and DCASE, and total video counts for some. However, it does not explicitly provide specific percentages or absolute counts for training, validation, and test splits within these datasets, nor does it refer to predefined splits with specific details or citations for reproducibility. |
| Hardware Specification | No | The paper states that experiments were conducted 'with experiments on 8 GPU cards,' but it does not specify the model or type of these GPU cards, or any other specific hardware details like CPU, memory, or cloud instance types. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA versions used for the experiments. |
| Experiment Setup | Yes | The model is training with SGD using a linear warm-up scheme at an initial learning rate of 0.03. The SGD weight decay is 10 5 and the momentum is 0.9. The total epochs we used are 200 and the batch size is set as 128 with experiments on 8 GPU cards. We set the negative pairs K as 16,384, the temperature parameter τ as 0.07. |