FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval

Authors: Yanzhe Chen, Huasong Zhong, Xiangteng He, Yuxin Peng, Jiahuan Zhou, Lele Cheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate our approach s state-of-the-art performance on four commonly used datasets.
Researcher Affiliation Collaboration 1Wangxuan Institute of Computer Technology, Peking University 2Kuaishou Technology
Pseudocode No Information insufficient. The paper describes its models and processes using text and mathematical equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No Information insufficient. The paper does not provide an explicit statement or link for the open-source code of its proposed method.
Open Datasets Yes We conduct extensive experiments on four commonly used datasets, namely Fashion IQ (Yu et al. 2020), Fashion200K (Liu et al. 2021), CIRR (Berg, Berg, and Shih 2010) and Shoes (Han et al. 2017).
Dataset Splits Yes CIRR (Liu et al. 2021): The dataset contains 21,552 real-world images from NLVR2 (Suhr et al. 2018). There are 36,554 triplets in total, divided into 3 subsets with 80% in training, 10% in validation, and 10% in testing.
Hardware Specification Yes We use 8 Tesla V100 GPUs for model training.
Software Dependencies No Information insufficient. The paper mentions the use of an optimizer (Adam) and model backbones (Res Net50x4, Vi TB/16) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The initial learning rate is 4e-5, and we adopt a cosine annealing strategy to adjust it. The total number of training epochs is 50. We use Adam (Kingma and Ba 2014) to optimize the network with a mini-batch size of 1024.