Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval

Authors: Likai Tian, Zhengwei Yang, Zechao Hu, Hao Li, Yifang Yin, Zheng Wang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive experiments on two standard video-to-shop datasets, Moving Fashion [Godi et al., 2022] and Deep Fashion2 [Ge et al., 2019], demonstrate the superiority of the proposed SF-CLIP.
Researcher Affiliation Academia 1National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science, Wuhan University, China 2Hubei Key Laboratory of Multimedia and Network Communication Engineering 3Institute for Infocomm Research, A*STAR, Singapore
Pseudocode No The paper includes diagrams (Figure 3) illustrating processes but does not provide pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a direct link to a code repository for its methodology. It only cites a third-party tool: "https://github. com/open-mmlab/mmpose, 2020."
Open Datasets Yes Moving Fashion [Godi et al., 2022] is a VSR dataset consisting of over 15,000 pairs of videos and corresponding online clothing items. Multi-Deep Fashion2 [Godi et al., 2022] The original Deep Fashion2 dataset [Ge et al., 2019] is primarily used for street-to-shop image retrieval.
Dataset Splits No The paper mentions using 1024 samples for pseudo-label generation and defining salient frame numbers for training and testing, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) for the main datasets (Moving Fashion, Deep Fashion2) used in its experiments.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using "Adam [Kingma and Ba, 2014]" as an optimizer, but it does not specify version numbers for Adam or any other software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes During the pseudo-label generation process of occlusion aspect, we set the dropout rate to 0.5 and λocc to 0.9. As for truncation aspect, we set the confidence threshold λ to 0.8 and λtru to 0.6. The learning rate for Adapter is set to 3 × 10−4, and Text Encoder is set to 5 × 10−7, with a total of 30 epochs. ... The module is trained using Adam [Kingma and Ba, 2014] with a learning rate of 1 × 10−4 for a total of 60 epochs. The salient frame number is defined as 3 for training and 10 for testing.