Expressiveness is Effectiveness: Self-supervised Fashion-aware CLIP for Video-to-Shop Retrieval
Authors: Likai Tian, Zhengwei Yang, Zechao Hu, Hao Li, Yifang Yin, Zheng Wang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments on two standard video-to-shop datasets, Moving Fashion [Godi et al., 2022] and Deep Fashion2 [Ge et al., 2019], demonstrate the superiority of the proposed SF-CLIP. |
| Researcher Affiliation | Academia | 1National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence, School of Computer Science, Wuhan University, China 2Hubei Key Laboratory of Multimedia and Network Communication Engineering 3Institute for Infocomm Research, A*STAR, Singapore |
| Pseudocode | No | The paper includes diagrams (Figure 3) illustrating processes but does not provide pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a direct link to a code repository for its methodology. It only cites a third-party tool: "https://github. com/open-mmlab/mmpose, 2020." |
| Open Datasets | Yes | Moving Fashion [Godi et al., 2022] is a VSR dataset consisting of over 15,000 pairs of videos and corresponding online clothing items. Multi-Deep Fashion2 [Godi et al., 2022] The original Deep Fashion2 dataset [Ge et al., 2019] is primarily used for street-to-shop image retrieval. |
| Dataset Splits | No | The paper mentions using 1024 samples for pseudo-label generation and defining salient frame numbers for training and testing, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) for the main datasets (Moving Fashion, Deep Fashion2) used in its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using "Adam [Kingma and Ba, 2014]" as an optimizer, but it does not specify version numbers for Adam or any other software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | During the pseudo-label generation process of occlusion aspect, we set the dropout rate to 0.5 and λocc to 0.9. As for truncation aspect, we set the confidence threshold λ to 0.8 and λtru to 0.6. The learning rate for Adapter is set to 3 × 10−4, and Text Encoder is set to 5 × 10−7, with a total of 30 epochs. ... The module is trained using Adam [Kingma and Ba, 2014] with a learning rate of 1 × 10−4 for a total of 60 epochs. The salient frame number is defined as 3 for training and 10 for testing. |