TempCLR: Temporal Alignment Representation with Contrastive Learning

Authors: Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on video retrieval, action step localization, and few-shot action recognition, and achieve consistent performance gain over all three tasks. Detailed ablation studies are provided to justify the approach design.
Researcher Affiliation Academia Yuncong Yang Jiawei Ma Shiyuan Huang Long Chen Xudong Lin Guangxing Han Shih-Fu Chang Columbia University, New York, NY 10027, USA {yy3035,jiawei.m,shiyuan.h,cl3695,xudong.lin,gh2561,sc250}@columbia.edu
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes Code Link: https://github.com/yyuncong/Temp CLR
Open Datasets Yes We follow Xu et al. (2021) and use How To100M (HT100M) (Miech ets al., 2019) for pre-training. ... We evaluate the model pretrained with our Temp CLR on Youcook II(Zhou et al., 2018) without any finetuning... We perform evaluation on Cross Task (Zhukov et al., 2019)...
Dataset Splits Yes The subset contains 100 classes where |Cb| = 64 and |Cn| is 24 (12) classes are for evaluation (validation).
Hardware Specification Yes we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014) optimizer' and Transformer architectures, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes during pre-training, we set the batch size of captions is 256 and the captions are sampled from 16 long videos. As we trained from the Video CLIP pre-trained model, when all parameters are trained, we set the learning rate as 1e 5. Then, for comparison, as an ablation study, we only update the parameters in the norm layers such as layer norm and the learning rate is set as 2e 5. In both case, we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours and use the default hyper-parameters in Adam (Kingma & Ba, 2014) optimizer with betas of (0.9, 0.98).