Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Active Contrastive Learning of Audio-Visual Video Representations
Authors: Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves state-of-the-art performance on challenging audio and visual downstream benchmarks including UCF101, HMDB51 and ESC50. |
| Researcher Affiliation | Collaboration | Shuang Ma Microsoft Redmond, WA, USA Zhaoyang Zeng Sun Yat-sen University Guangzhou, China Daniel Mc Duff Microsoft Research Redmond, WA, USA Yale Song Microsoft Research Redmond, WA, USA |
| Pseudocode | Yes | Algorithm 1 describes our proposed cross-modal active contrastive coding... Algorithm 2 Cross-Modal Active Contrastive Coding (Detailed version of Algorithm 1)... Algorithm 3 k-MEANS++ INIT Seed Cluster Initialization... Algorithm 4 Cross-Modal Contrastive Coding without Active Sampling |
| Open Source Code | Yes | 1Code is available at: https://github.com/yunyikristy/CM-ACC |
| Open Datasets | Yes | When pretrained on Audio Set (Gemmeke et al., 2017), our approach achieves new state-of-the-art classification performance on UCF101 (Soomro et al., 2012), HMDB51 (Kuehne et al., 2011), and ESC50 (Piczak, 2015b). |
| Dataset Splits | Yes | UCF101 and HMDB51 have 3 official train/test splits, while ESC50 has 5 splits. We conduct our ablation study using split-1 of each dataset. We report our average performance over all splits when we compare with prior work. |
| Hardware Specification | Yes | We used 40 NVIDIA Tesla P100 GPUs for our experiments. |
| Software Dependencies | No | All models are trained end-to-end with the ADAM optimizer (Kingma & Ba, 2014) (No specific version numbers for Adam or other software dependencies are provided.) |
| Experiment Setup | Yes | All models are trained end-to-end with the ADAM optimizer (Kingma & Ba, 2014) with an initial learning rate γ = 10 3 after a warm-up period of 500 iterations. We use the mini-batch size M = 128, dictionary size K = 30 128, pool size N = 300 128, momentum m = 0.999, and temperature τ = 0.7. |