Easy Regional Contrastive Learning of Expressive Fashion Representations

Authors: Daiqing Qi, Handong Zhao, Sheng Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments with existing benchmark datasets, including our new benchmark dataset (Fig. 4) for fashion cross-modal retrieval, which differs from the popular benchmark Fashion Gen [36] in notably larger size, a wider variety of brands and products, more concise and general descriptions, and more diverse image scopes, making it more challenging and more practical.
Researcher Affiliation Collaboration Daiqing Qi University of Virginia Charlottesville, VA 22904 daiqing.qi@virginia.edu Handong Zhao Adobe Research San Jose, CA 95110 hazhao@adobe.com Sheng Li University of Virginia Charlottesville, VA 22904 shengli@virginia.edu
Pseudocode No The paper describes the method using figures and text but does not include an explicit pseudocode or algorithm block.
Open Source Code Yes We will provide open access to both data and source code with sufficient instructions.
Open Datasets Yes For a fair comparison, we first evaluate our model on the benchmark dataset Fashion Gen [36], following existing works [10, 56, 12, 27]. Besides, we also collect text descriptions and product images in fashion domain from Amazon Reviews [29] and build a large-scale fashion dataset which contains 1.3M image-text pairs, where we use 910K and 390K pairs for training and test, respectively.
Dataset Splits No The paper specifies training and test splits for the datasets (e.g., '910K and 390K pairs for training and test, respectively' for Amazon Fashion, and '260, 480 and 35, 528 image-text pairs for training and testing, respectively' for Fashion Gen), but does not explicitly mention a separate validation split for either dataset.
Hardware Specification No The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper describes experimental settings like training epochs and learning rates but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or CUDA versions).
Experiment Setup Yes For Fashion Gen, following [27, 10], the model is trained for 20 epochs. The weight decay is set to 1e 4, and the learning rate is set to 5e 5 with the cosine annealing learning rate decay scheduler applied. As the selection tokens are randomly initialized, to match them with the pre-trained CLIP model, we first freeze other parameters and train selection tokens with an initial learning rate 5e 4 for 5 epochs. The default batch size is 64. Configurations are the same with Amazon Fashion, while the epoch is set to 10.