reproducibilityindex.ai

STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment

Authors: Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental validation on multiple benchmarks shows that our method achieves a 3.69%p of relative performance gain in zero-shot retrieval tasks compared to strong continual learning baselines, while reducing memory consumption by 45%. Our code is available at https://cl-stella.github.io/. In this section, we experimentally validate the effectiveness of our method in task-free continual audio-video pre-training. We start by outlining our experimental setup in Sec. 5.1, covering datasets, evaluation methods, evaluation metrics, and baseline methods employed for our experiments. Subsequently, we present the experimental results and conduct a comprehensive analysis in Sec. 5.2.
Researcher Affiliation	Collaboration	Jaewoo Lee 1 * Jaehong Yoon 2 * Wonjae Kim 3 Yunji Kim 3 Sung Ju Hwang 1 4 1KAIST 2UNC Chapel Hill 3NAVER AI Lab 4Deep Auto.
Pseudocode	Yes	Algorithm 1 Audio time chunk selection in a Py Torch-like Style. Algorithm 2 Continual Pre-training of STELLA Algorithm 3 Continual Pre-training of STELLA+
Open Source Code	Yes	Our code is available at https://cl-stella.github.io/.
Open Datasets	Yes	We validate our method on continual audio-video pre-training over VGGSound (Chen et al., 2020) and Audio Set (Gemmeke et al., 2017) datasets, consisting of 10s videos. For downstream tasks, we use two audiovisual datasets: MSR-VTT (Xu et al., 2016) and AVE (Tian et al., 2020).
Dataset Splits	No	The paper describes training and test sets (e.g., MSR-VTT training dataset and test dataset yielding 6k and 0.9k video clips respectively), but does not explicitly define a separate validation split from the main datasets or tasks for model evaluation or hyperparameter tuning.
Hardware Specification	Yes	GPUs 4 A100 or 4 V100
Software Dependencies	No	The paper mentions optimizers like 'Adam' and 'Adam W' and provides 'Py Torch-like pseudo code', but it does not specify version numbers for any software dependencies or libraries (e.g., PyTorch version, CUDA version).
Experiment Setup	Yes	Table 6: Audio-Video pre-training and fine-tuning hyperparameters. This table specifies Optimizer, Learning rate, Weight decay, Learning rate schedule, Warmup epochs, Epoch, Batch size, and various audio/video processing parameters (e.g., Audio Random Time Shifting yes/no, Audio Norm Mean/STD, Video Multi Scale Crop yes/no, Video Norm Mean/STD).