EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive ablation studies and qualitative results verify the effectiveness of our method. Equi AV outperforms previous works across various audiovisual benchmarks. |
| Researcher Affiliation | Academia | 1Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea. Correspondence to: Jongsuk Kim <jskpop@kaist.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 Equi AV |
| Open Source Code | Yes | The code is available on https://github.com/Jong Suk1/Equi AV |
| Open Datasets | Yes | We utilize two prominent audio-visual datasets for our experiments: Audio Set (Gemmeke et al., 2017) and VGGSound (Chen et al., 2020a). |
| Dataset Splits | No | The paper does not explicitly provide validation dataset splits with percentages or counts for reproduction. While it mentions "evaluation" clips for Audio Set and train/test splits for VGGSound, a distinct "validation" split is not specified. |
| Hardware Specification | Yes | GPUs 8 A6000 (Pre-training), 8 A5000 (Fine-tuning) |
| Software Dependencies | No | The paper mentions software components and techniques like "Adam W Optimizer", "half-cycle cosine annealing (Loshchilov & Hutter, 2017)", "Vision Transformer", "MAE", and "Spec Augment", but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The hyperparameter settings used in this paper are listed in Table D. (e.g., Optimizer Adam W Optimizer momentum β1=0.9, β2=0.95 Weight decay 1e-5 Learning rate scheduler half-cycle cosine annealing (Loshchilov & Hutter, 2017) Initial learning rate 1e-6 Peak learning rate 1e-4 Warm-up epochs 2 Epochs 20 Batch size 256) |