TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification

Authors: Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang, Huchuan Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our proposed method shows much better results than other state-of-the-art methods on MARS, LS-VID and i LIDS-VID.
Researcher Affiliation Academia School of Information and Communication Engineering, Dalian University of Technology, Dalian, China; School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China; School of Future Technology, School of Artificial Intelligence, Dalian University of Technology, Dalian, China; Ningbo Institute, Dalian University of Technology, Ningbo, China
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The code is available at https://github.com/Asurada Yuci/TF-CLIP.
Open Datasets Yes We evaluate our proposed approach on three video-based person Re ID benchmarks, including MARS (Zheng et al. 2016), LS-VID (Li et al. 2019) and i LIDS-VID (Wang et al. 2014).
Dataset Splits No The paper describes sampling and mini-batch sizes for training but does not explicitly provide percentages or sample counts for train/validation/test dataset splits across the entire datasets.
Hardware Specification Yes Our model is implemented on the Py Torch platform and trained with one NVIDIA Tesla A30 GPU (24G memory).
Software Dependencies No The paper mentions 'Py Torch platform' but does not specify a version number or list other software dependencies with versions.
Experiment Setup Yes During training, we sample 8 frames from each video sequence and each frame is resized to 256 128. In each mini-batch, we sample 4 identities, each with 4 tracklets. Thus, the number of images in a batch is 4 4 8=128. We also adopt random flipping and random erasing (Zhong et al. 2020) for data augmentation. We train our framework for 60 epochs in total by the Adam optimizer (Kingma and Ba 2014). Following CLIP-Re ID (Li, Sun, and Li 2022), we first warm up the model for 10 epochs with a linearly growing learning rate from 5 10 7 to 5 10 6. Then, the learning rate is divided by 10 at the 30th and 50th epochs.