Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition
Authors: Yuke Li, Guangyi Chen, Ben Abramowitz, Stefano Anzellotti, Donglai Wei
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments, 4.1. Experimental Setup, 4.2. Benchmark Results, 4.3. Ablation Studies |
| Researcher Affiliation | Academia | 1Boston College, Boston MA, USA 2Carnegie Mellon University, Pittsburgh PA, USA; Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 3Tulane University, New Orleans LA, USA. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about making the source code publicly available or a link to a code repository. |
| Open Datasets | Yes | Datasets. In this work, we conduct experiments on five datasets: 1. Something-Something v2 (SSv2)...; 2. Something-Else (Sth-Else)...; 3. HMDB-51...; 4. UCF-101...; 5. Kinetics... For the experiments that perform training and testing on the Sth-Else dataset, we use the official split of data (Materzynska et al., 2020; Herzig et al., 2022; Ben Avraham et al., 2022). The other datasets we use for novel data are SSv2 (Goyal et al., 2017), SSv2-small (Zhu & Yang, 2018), HMDB-51 (Kuehne et al., 2011), and UCF101 (Soomro et al., 2012). |
| Dataset Splits | Yes | We carry out two types of few-shot learning experiments; all-way-k-shot and 5-way-k-shot. ... Once we partition our data into base set D and novel set S Q, the number k determines how many samples for each action label we choose for S and the rest are used for the query set Q. ... The phase 1 training is on the base data D, where the video action labels are from the set Cbase. The novel data consists of two parts, the support set S for updating the model and the query set Q for inference. ... The phase 2 training updates the model on the support set S, aiming to improve the inference accuracy for novel classes in Q. |
| Hardware Specification | Yes | Our models are implemented using Py Torch, and experiments are conducted on four Nvidia Ge Force 2080Ti graphics cards. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for it or any other key software components used in the experiments. |
| Experiment Setup | Yes | We use the Adam W optimizer (Loshchilov & Hutter, 2019) and cosine annealing to train our network with a learning rate initialized at 0.002 and weight decay of 10 2. For all video sequences we use T = 16 uniformly selected frames. To compute the ELBO loss, we choose β = 0.02 to balance the reconstruction loss and KL-divergence. Also, we set τ = 0.07 for the NCE loss. Regarding the hyperparameters, we set d = 12 in Eq. 1, and S = 35 in Eq. 4. |