Cycle-Contrast for Self-Supervised Video Representation Learning

Authors: Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu Murakami

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the effectiveness of our representation learning approach by using four datasets: Kinetics-400[12], UCF101[25], HMDB51[15] and MMAct[23] under standard evaluation protocols. The learned network backbones are evaluated via two tasks: nearest neighbour retrieval and action recognition.
Researcher Affiliation Industry Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, Tomokazu Murakami Lumada Data Science Lab. Hitachi, Ltd. quan.kong.xz@hitachi.com
Pseudocode No The paper describes the method using text and diagrams (Figure 1), but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or providing a link to a code repository.
Open Datasets Yes In this section, we evaluate the effectiveness of our representation learning approach by using four datasets: Kinetics-400[12], UCF101[25], HMDB51[15] and MMAct[23] under standard evaluation protocols.
Dataset Splits Yes We use the test split1 of UCF101 to evaluate our self-supervised method. ... We pre-train our network by using CCL on Kinetics-400 train split. ... The network is fine-tuned end-to-end as other methods by 35 epochs for all test datasets.
Hardware Specification No The training was performed on 4 GPUs, taking 8 days on Kinetics-400 and 0.5 day on UCF101.
Software Dependencies No The paper mentions using SGD optimizer and an R3D architecture, but does not provide specific software library names with version numbers.
Experiment Setup Yes We constrain our experiments to a 3D Res Net. Table 1 provides the specifications of the network. It has L = 8 frames scaled to 128 x 171 and randomly cropped to the size 112 x 112 as the network input... The temperature parameter τ is set to 1 in Eq.2 and Eq.4. Balance parameters w1, w2 and w3 in Eq.6 are set to be 0.2, 0.2 and 0.4. The training was performed on 4 GPUs... Self-Training Phase. We train our network by using CCL on UCF101 train split 1. The mini-batch is set to 48 videos and use the SGD optimizer with learning rate 0.0001. We divide the leaning rate every 20 epochs by 10 for a total of 100 epochs. Weight decay is set to 0.005... We pre-train our network by using CCL on Kinetics-400 train split. The mini-batch is set to 48 videos and uses the SGD optimizer with learning rate 0.01. We divide the leaning rate every 20 epochs by 10 for a total 80 epochs. Weight decay is set to 0.0001. Fine-tune Phase... The network is fine-tuned end-to-end as other methods by 35 epochs for all test datasets.