reproducibilityindex.ai

Cross-Modal Learning with Adversarial Samples

Authors: CHAO LI, Shangqian Gao, Cheng Deng, De Xie, Wei Liu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two cross-modal benchmark datasets show that the adversarial examples produced by our CMLA are efﬁcient in fooling a target deep cross-modal hashing network.
Researcher Affiliation	Collaboration	Chao Li1,2 Cheng Deng1, Shangqian Gao2 De Xie1 Wei Liu3, 1School of Electronic Engineering, Xidian University, Xi an, Shaanxi, China 2Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA 3Tencent AI Lab, China
Pseudocode	Yes	Algorithm 1 Cross-Modal correlation Learning with Adversarial samples (CMLA).
Open Source Code	No	The paper states that source codes of DCMH and SSAH (baselines) were provided by authors, but it does not provide an explicit statement or link for the code of their proposed CMLA method.
Open Datasets	Yes	Extensive experiments on two benchmarks: MIRFlickr-25K [22] and NUS-WIDE [10] are conducted
Dataset Splits	Yes	For MIRFlickr-25K, 2,000 data points are randomly selected as a query set, 10,000 data points are used as a training set to train the target retrieval network model, and the remainder is kept as a retrieval database. 5,000 data points from the training set are further sampled to learn adversarial samples. For NUS-WIDE, we randomly sample 2,100 data points as a query set and 10,500 data points as a training set.
Hardware Specification	Yes	Our proposed CMLA is implemented via Tensor Flow [1] and is run on a server with two NVIDIA Tesla P40 GPUs holding a graphics memory capacity of 24GB for each one.
Software Dependencies	Yes	Our proposed CMLA is implemented via Tensor Flow [1]
Experiment Setup	Yes	All images are resized to 224 224 3 before being used as the inputs. In adversarial sample learning, we use the Adam optimizer respectively with initial learning rates 0.5 and 0.002 for the image and text modalities, and train each sample for Tmax iterations. All hyper-parameters α, β, λ, ξ, γ, and η are set as 1 empirically. The mini-batch size is ﬁxed at 128. ϵv is set as 8 for the image modality, and ϵt is set as 0.01 for the text modality.