Self-Supervised Video Representation Learning via Latent Time Navigation
Authors: Di Yang, Yaohui Wang, Quan Kong, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, François Brémond
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification in fine-grained and human-oriented tasks (e.g., on Toyota Smarthome dataset). In addition, we demonstrate that our proposed model, when pre-trained on Kinetics-400, generalizes well onto the unseen real world video benchmark datasets UCF101 and HMDB51, achieving state-of-the-art performance in action recognition. |
| Researcher Affiliation | Collaboration | 1Inria, 2004 Rte des Lucioles, Valbonne, France 2Universit e Cˆote d Azur, 28 Av. de Valrose, Nice, France 3Toyota Motor Europe, 60 Av. du Bourget, Brussels, Belgium 4Woven Planet Holdings, 3-2-1 Nihonbashimuromachi, Chuo-ku, Tokyo, Japan 5Shanghai AI Laboratory, 701 Yunjin Road, Shanghai, China |
| Pseudocode | No | The paper describes the proposed approach through text and diagrams (Figure 2) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for the source code of the described methodology. |
| Open Datasets | Yes | We conduct extensive experiments to evaluate LTN on four action classification datasets: Toyota Smarthome, Kinetics400, UCF101 and HMDB51. |
| Dataset Splits | No | The paper mentions using datasets like Smarthome Cross-Subject, Kinetics-400, UCF101, and HMDB51, and discusses evaluation protocols (e.g., linear evaluation, fine-tuning), but does not explicitly provide specific percentages or sample counts for training, validation, and test splits within its text. |
| Hardware Specification | No | The paper states 'This work was granted access to the HPC resources of IDRIS under the allocation AD011011627R1,' but it does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment. |
| Experiment Setup | Yes | For the proposed Dt, unless otherwise stated, we set M = 64 directions over the dim = 2048 dimensions... The results shown in Tab. 3 suggest that 2-layer MLP with 2048 dimensions in the hidden layer is the most effective. |