TempCLR: Temporal Alignment Representation with Contrastive Learning
Authors: Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on video retrieval, action step localization, and few-shot action recognition, and achieve consistent performance gain over all three tasks. Detailed ablation studies are provided to justify the approach design. |
| Researcher Affiliation | Academia | Yuncong Yang Jiawei Ma Shiyuan Huang Long Chen Xudong Lin Guangxing Han Shih-Fu Chang Columbia University, New York, NY 10027, USA {yy3035,jiawei.m,shiyuan.h,cl3695,xudong.lin,gh2561,sc250}@columbia.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code Link: https://github.com/yyuncong/Temp CLR |
| Open Datasets | Yes | We follow Xu et al. (2021) and use How To100M (HT100M) (Miech ets al., 2019) for pre-training. ... We evaluate the model pretrained with our Temp CLR on Youcook II(Zhou et al., 2018) without any finetuning... We perform evaluation on Cross Task (Zhukov et al., 2019)... |
| Dataset Splits | Yes | The subset contains 100 classes where |Cb| = 64 and |Cn| is 24 (12) classes are for evaluation (validation). |
| Hardware Specification | Yes | we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014) optimizer' and Transformer architectures, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | during pre-training, we set the batch size of captions is 256 and the captions are sampled from 16 long videos. As we trained from the Video CLIP pre-trained model, when all parameters are trained, we set the learning rate as 1e 5. Then, for comparison, as an ablation study, we only update the parameters in the norm layers such as layer norm and the learning rate is set as 2e 5. In both case, we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours and use the default hyper-parameters in Adam (Kingma & Ba, 2014) optimizer with betas of (0.9, 0.98). |