reproducibilityindex.ai

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive ablation studies and qualitative results verify the effectiveness of our method. Equi AV outperforms previous works across various audiovisual benchmarks.
Researcher Affiliation	Academia	1Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea. Correspondence to: Jongsuk Kim <jskpop@kaist.ac.kr>.
Pseudocode	Yes	Algorithm 1 Equi AV
Open Source Code	Yes	The code is available on https://github.com/Jong Suk1/Equi AV
Open Datasets	Yes	We utilize two prominent audio-visual datasets for our experiments: Audio Set (Gemmeke et al., 2017) and VGGSound (Chen et al., 2020a).
Dataset Splits	No	The paper does not explicitly provide validation dataset splits with percentages or counts for reproduction. While it mentions "evaluation" clips for Audio Set and train/test splits for VGGSound, a distinct "validation" split is not specified.
Hardware Specification	Yes	GPUs 8 A6000 (Pre-training), 8 A5000 (Fine-tuning)
Software Dependencies	No	The paper mentions software components and techniques like "Adam W Optimizer", "half-cycle cosine annealing (Loshchilov & Hutter, 2017)", "Vision Transformer", "MAE", and "Spec Augment", but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	The hyperparameter settings used in this paper are listed in Table D. (e.g., Optimizer Adam W Optimizer momentum β1=0.9, β2=0.95 Weight decay 1e-5 Learning rate scheduler half-cycle cosine annealing (Loshchilov & Hutter, 2017) Initial learning rate 1e-6 Peak learning rate 1e-4 Warm-up epochs 2 Epochs 20 Batch size 256)