Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

Authors: Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellotti, Donglai Wei

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments, 4.1. Experimental Setup, 4.2. Benchmark Results, 4.3. Ablation Studies
Researcher Affiliation Academia 1Boston College, Boston MA, USA 2Carnegie Mellon University, Pittsburgh PA, USA; Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 3Tulane University, New Orleans LA, USA.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about making the source code publicly available or a link to a code repository.
Open Datasets Yes Datasets. In this work, we conduct experiments on five datasets: 1. Something-Something v2 (SSv2)...; 2. Something-Else (Sth-Else)...; 3. HMDB-51...; 4. UCF-101...; 5. Kinetics... For the experiments that perform training and testing on the Sth-Else dataset, we use the official split of data (Materzynska et al., 2020; Herzig et al., 2022; Ben Avraham et al., 2022). The other datasets we use for novel data are SSv2 (Goyal et al., 2017), SSv2-small (Zhu & Yang, 2018), HMDB-51 (Kuehne et al., 2011), and UCF101 (Soomro et al., 2012).
Dataset Splits Yes We carry out two types of few-shot learning experiments; all-way-k-shot and 5-way-k-shot. ... Once we partition our data into base set D and novel set S Q, the number k determines how many samples for each action label we choose for S and the rest are used for the query set Q. ... The phase 1 training is on the base data D, where the video action labels are from the set Cbase. The novel data consists of two parts, the support set S for updating the model and the query set Q for inference. ... The phase 2 training updates the model on the support set S, aiming to improve the inference accuracy for novel classes in Q.
Hardware Specification Yes Our models are implemented using Py Torch, and experiments are conducted on four Nvidia Ge Force 2080Ti graphics cards.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for it or any other key software components used in the experiments.
Experiment Setup Yes We use the Adam W optimizer (Loshchilov & Hutter, 2019) and cosine annealing to train our network with a learning rate initialized at 0.002 and weight decay of 10 2. For all video sequences we use T = 16 uniformly selected frames. To compute the ELBO loss, we choose β = 0.02 to balance the reconstruction loss and KL-divergence. Also, we set τ = 0.07 for the NCE loss. Regarding the hyperparameters, we set d = 12 in Eq. 1, and S = 35 in Eq. 4.