Self-Supervised Video Representation Learning via Latent Time Navigation

Authors: Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, François Brémond

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification in fine-grained and human-oriented tasks (e.g., on Toyota Smarthome dataset). In addition, we demonstrate that our proposed model, when pre-trained on Kinetics-400, generalizes well onto the unseen real world video benchmark datasets UCF101 and HMDB51, achieving state-of-the-art performance in action recognition.
Researcher Affiliation Collaboration 1Inria, 2004 Rte des Lucioles, Valbonne, France 2Universit e Cˆote d Azur, 28 Av. de Valrose, Nice, France 3Toyota Motor Europe, 60 Av. du Bourget, Brussels, Belgium 4Woven Planet Holdings, 3-2-1 Nihonbashimuromachi, Chuo-ku, Tokyo, Japan 5Shanghai AI Laboratory, 701 Yunjin Road, Shanghai, China
Pseudocode No The paper describes the proposed approach through text and diagrams (Figure 2) but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for the source code of the described methodology.
Open Datasets Yes We conduct extensive experiments to evaluate LTN on four action classification datasets: Toyota Smarthome, Kinetics400, UCF101 and HMDB51.
Dataset Splits No The paper mentions using datasets like Smarthome Cross-Subject, Kinetics-400, UCF101, and HMDB51, and discusses evaluation protocols (e.g., linear evaluation, fine-tuning), but does not explicitly provide specific percentages or sample counts for training, validation, and test splits within its text.
Hardware Specification No The paper states 'This work was granted access to the HPC resources of IDRIS under the allocation AD011011627R1,' but it does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes For the proposed Dt, unless otherwise stated, we set M = 64 directions over the dim = 2048 dimensions... The results shown in Tab. 3 suggest that 2-layer MLP with 2048 dimensions in the hidden layer is the most effective.