Easy Regional Contrastive Learning of Expressive Fashion Representations
Authors: Daiqing Qi, Handong Zhao, Sheng Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments with existing benchmark datasets, including our new benchmark dataset (Fig. 4) for fashion cross-modal retrieval, which differs from the popular benchmark Fashion Gen [36] in notably larger size, a wider variety of brands and products, more concise and general descriptions, and more diverse image scopes, making it more challenging and more practical. |
| Researcher Affiliation | Collaboration | Daiqing Qi University of Virginia Charlottesville, VA 22904 daiqing.qi@virginia.edu Handong Zhao Adobe Research San Jose, CA 95110 hazhao@adobe.com Sheng Li University of Virginia Charlottesville, VA 22904 shengli@virginia.edu |
| Pseudocode | No | The paper describes the method using figures and text but does not include an explicit pseudocode or algorithm block. |
| Open Source Code | Yes | We will provide open access to both data and source code with sufficient instructions. |
| Open Datasets | Yes | For a fair comparison, we first evaluate our model on the benchmark dataset Fashion Gen [36], following existing works [10, 56, 12, 27]. Besides, we also collect text descriptions and product images in fashion domain from Amazon Reviews [29] and build a large-scale fashion dataset which contains 1.3M image-text pairs, where we use 910K and 390K pairs for training and test, respectively. |
| Dataset Splits | No | The paper specifies training and test splits for the datasets (e.g., '910K and 390K pairs for training and test, respectively' for Amazon Fashion, and '260, 480 and 35, 528 image-text pairs for training and testing, respectively' for Fashion Gen), but does not explicitly mention a separate validation split for either dataset. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper describes experimental settings like training epochs and learning rates but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or CUDA versions). |
| Experiment Setup | Yes | For Fashion Gen, following [27, 10], the model is trained for 20 epochs. The weight decay is set to 1e 4, and the learning rate is set to 5e 5 with the cosine annealing learning rate decay scheduler applied. As the selection tokens are randomly initialized, to match them with the pre-trained CLIP model, we first freeze other parameters and train selection tokens with an initial learning rate 5e 4 for 5 epochs. The default batch size is 64. Configurations are the same with Amazon Fashion, while the epoch is set to 10. |