reproducibilityindex.ai

TempCLR: Temporal Alignment Representation with Contrastive Learning

Authors: Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on video retrieval, action step localization, and few-shot action recognition, and achieve consistent performance gain over all three tasks. Detailed ablation studies are provided to justify the approach design.
Researcher Affiliation	Academia	Yuncong Yang Jiawei Ma Shiyuan Huang Long Chen Xudong Lin Guangxing Han Shih-Fu Chang Columbia University, New York, NY 10027, USA {yy3035,jiawei.m,shiyuan.h,cl3695,xudong.lin,gh2561,sc250}@columbia.edu
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code Link: https://github.com/yyuncong/Temp CLR
Open Datasets	Yes	We follow Xu et al. (2021) and use How To100M (HT100M) (Miech ets al., 2019) for pre-training. ... We evaluate the model pretrained with our Temp CLR on Youcook II(Zhou et al., 2018) without any ﬁnetuning... We perform evaluation on Cross Task (Zhukov et al., 2019)...
Dataset Splits	Yes	The subset contains 100 classes where \|Cb\| = 64 and \|Cn\| is 24 (12) classes are for evaluation (validation).
Hardware Specification	Yes	we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014) optimizer' and Transformer architectures, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	during pre-training, we set the batch size of captions is 256 and the captions are sampled from 16 long videos. As we trained from the Video CLIP pre-trained model, when all parameters are trained, we set the learning rate as 1e 5. Then, for comparison, as an ablation study, we only update the parameters in the norm layers such as layer norm and the learning rate is set as 2e 5. In both case, we train the model on 2 NVIDIA Ti TAN RTX GPUs (each with 24 GB memory) for 10 epoches within two hours and use the default hyper-parameters in Adam (Kingma & Ba, 2014) optimizer with betas of (0.9, 0.98).