Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Authors: Kanchana Ranasinghe, Michael S Ryoo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on action recognition datasets showcase state-of-the-art performance for our learned representations under linear-probing, standard zero-shot, and transductive zero-shot settings. |
| Researcher Affiliation | Academia | Kanchana Ranasinghe Stony Brook University kranasinghe@cs.stonybrook.edu Michael Ryoo Stony Brook University mryoo@cs.stonybrook.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our action descriptions will be released publicly along with our codebase. |
| Open Datasets | Yes | We use three standard action recognition benchmark datasets in our experiments: Kinetics400 [70], UCF-101 [71], and HMBD-51 [72]. |
| Dataset Splits | Yes | Kinetics-400 is a large-scale dataset containing 240,000 training videos and 20,000 validation videos belonging to 400 different action classes. |
| Hardware Specification | Yes | We train for 15 epochs using a batch size of 32 across 4 NVIDIA-A5000 GPUs using ADAM-W [76, 77] optimizer on the student model with an initial learning rate of 1e 5 following a cosine decay schedule. |
| Software Dependencies | No | The paper mentions using ADAM-W optimizer and building upon SVT and CLIP source code and weights, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train for 15 epochs using a batch size of 32 across 4 NVIDIA-A5000 GPUs using ADAM-W [76, 77] optimizer on the student model with an initial learning rate of 1e 5 following a cosine decay schedule. The EMA teacher is updated from student weights after each training iteration with a decay ratio of 2e 4. |