Active Contrastive Learning of Audio-Visual Video Representations

Authors: Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model achieves state-of-the-art performance on challenging audio and visual downstream benchmarks including UCF101, HMDB51 and ESC50.
Researcher Affiliation Collaboration Shuang Ma Microsoft Redmond, WA, USA Zhaoyang Zeng Sun Yat-sen University Guangzhou, China Daniel Mc Duff Microsoft Research Redmond, WA, USA Yale Song Microsoft Research Redmond, WA, USA
Pseudocode Yes Algorithm 1 describes our proposed cross-modal active contrastive coding... Algorithm 2 Cross-Modal Active Contrastive Coding (Detailed version of Algorithm 1)... Algorithm 3 k-MEANS++ INIT Seed Cluster Initialization... Algorithm 4 Cross-Modal Contrastive Coding without Active Sampling
Open Source Code Yes 1Code is available at: https://github.com/yunyikristy/CM-ACC
Open Datasets Yes When pretrained on Audio Set (Gemmeke et al., 2017), our approach achieves new state-of-the-art classification performance on UCF101 (Soomro et al., 2012), HMDB51 (Kuehne et al., 2011), and ESC50 (Piczak, 2015b).
Dataset Splits Yes UCF101 and HMDB51 have 3 official train/test splits, while ESC50 has 5 splits. We conduct our ablation study using split-1 of each dataset. We report our average performance over all splits when we compare with prior work.
Hardware Specification Yes We used 40 NVIDIA Tesla P100 GPUs for our experiments.
Software Dependencies No All models are trained end-to-end with the ADAM optimizer (Kingma & Ba, 2014) (No specific version numbers for Adam or other software dependencies are provided.)
Experiment Setup Yes All models are trained end-to-end with the ADAM optimizer (Kingma & Ba, 2014) with an initial learning rate γ = 10 3 after a warm-up period of 500 iterations. We use the mini-batch size M = 128, dictionary size K = 30 128, pool size N = 300 128, momentum m = 0.999, and temperature τ = 0.7.