Cycle-Contrast for Self-Supervised Video Representation Learning
Authors: Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu Murakami
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the effectiveness of our representation learning approach by using four datasets: Kinetics-400[12], UCF101[25], HMDB51[15] and MMAct[23] under standard evaluation protocols. The learned network backbones are evaluated via two tasks: nearest neighbour retrieval and action recognition. |
| Researcher Affiliation | Industry | Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu Murakami Lumada Data Science Lab. Hitachi, Ltd. quan.kong.xz@hitachi.com |
| Pseudocode | No | The paper describes the method using text and diagrams (Figure 1), but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or providing a link to a code repository. |
| Open Datasets | Yes | In this section, we evaluate the effectiveness of our representation learning approach by using four datasets: Kinetics-400[12], UCF101[25], HMDB51[15] and MMAct[23] under standard evaluation protocols. |
| Dataset Splits | Yes | We use the test split1 of UCF101 to evaluate our self-supervised method. ... We pre-train our network by using CCL on Kinetics-400 train split. ... The network is fine-tuned end-to-end as other methods by 35 epochs for all test datasets. |
| Hardware Specification | No | The training was performed on 4 GPUs, taking 8 days on Kinetics-400 and 0.5 day on UCF101. |
| Software Dependencies | No | The paper mentions using SGD optimizer and an R3D architecture, but does not provide specific software library names with version numbers. |
| Experiment Setup | Yes | We constrain our experiments to a 3D Res Net. Table 1 provides the specifications of the network. It has L = 8 frames scaled to 128 x 171 and randomly cropped to the size 112 x 112 as the network input... The temperature parameter τ is set to 1 in Eq.2 and Eq.4. Balance parameters w1, w2 and w3 in Eq.6 are set to be 0.2, 0.2 and 0.4. The training was performed on 4 GPUs... Self-Training Phase. We train our network by using CCL on UCF101 train split 1. The mini-batch is set to 48 videos and use the SGD optimizer with learning rate 0.0001. We divide the leaning rate every 20 epochs by 10 for a total of 100 epochs. Weight decay is set to 0.005... We pre-train our network by using CCL on Kinetics-400 train split. The mini-batch is set to 48 videos and uses the SGD optimizer with learning rate 0.01. We divide the leaning rate every 20 epochs by 10 for a total 80 epochs. Weight decay is set to 0.0001. Fine-tune Phase... The network is fine-tuned end-to-end as other methods by 35 epochs for all test datasets. |