Cross-Modal Learning with Adversarial Samples
Authors: CHAO LI, Shangqian Gao, Cheng Deng, De Xie, Wei Liu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two cross-modal benchmark datasets show that the adversarial examples produced by our CMLA are efficient in fooling a target deep cross-modal hashing network. |
| Researcher Affiliation | Collaboration | Chao Li1,2 Cheng Deng1, Shangqian Gao2 De Xie1 Wei Liu3, 1School of Electronic Engineering, Xidian University, Xi an, Shaanxi, China 2Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA 3Tencent AI Lab, China |
| Pseudocode | Yes | Algorithm 1 Cross-Modal correlation Learning with Adversarial samples (CMLA). |
| Open Source Code | No | The paper states that source codes of DCMH and SSAH (baselines) were provided by authors, but it does not provide an explicit statement or link for the code of their proposed CMLA method. |
| Open Datasets | Yes | Extensive experiments on two benchmarks: MIRFlickr-25K [22] and NUS-WIDE [10] are conducted |
| Dataset Splits | Yes | For MIRFlickr-25K, 2,000 data points are randomly selected as a query set, 10,000 data points are used as a training set to train the target retrieval network model, and the remainder is kept as a retrieval database. 5,000 data points from the training set are further sampled to learn adversarial samples. For NUS-WIDE, we randomly sample 2,100 data points as a query set and 10,500 data points as a training set. |
| Hardware Specification | Yes | Our proposed CMLA is implemented via Tensor Flow [1] and is run on a server with two NVIDIA Tesla P40 GPUs holding a graphics memory capacity of 24GB for each one. |
| Software Dependencies | Yes | Our proposed CMLA is implemented via Tensor Flow [1] |
| Experiment Setup | Yes | All images are resized to 224 224 3 before being used as the inputs. In adversarial sample learning, we use the Adam optimizer respectively with initial learning rates 0.5 and 0.002 for the image and text modalities, and train each sample for Tmax iterations. All hyper-parameters α, β, λ, ξ, γ, and η are set as 1 empirically. The mini-batch size is fixed at 128. ϵv is set as 8 for the image modality, and ϵt is set as 0.01 for the text modality. |