DM2C: Deep Mixed-Modal Clustering

Authors: Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework. In this section, we provide the empirical evaluation on two real-world mixed-modal dataset, Wikipedia and NUS-WIDE-10K.
Researcher Affiliation Academia 1State Key Laboratory of Information Security, Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS 4School of Computer Science and Tech., University of Chinese Academy of Sciences 5Key Laboratory of Big Data Mining and Knowledge Management, CAS 6Peng Cheng Laboratory
Pseudocode Yes Algorithm 1 Deep mixed-modal clustering algorithm
Open Source Code No The paper does not provide explicit access (e.g., specific repository link, explicit code release statement) to source code for the methodology described.
Open Datasets Yes The Wikipedia dataset2 [25] contains 2,866 image-text pairs selected from the Wikipedia s featured articles collection. The NUS-WIDE-10K dataset3 [10] consists of 10,000 image-text pairs evenly selected from the 10 largest semantic categories of NUS-WIDE dataset [8].
Dataset Splits No The paper specifies training and test set splits, but does not explicitly provide details for a validation set split.
Hardware Specification Yes All the experiments are performed on Ubuntu 16.04 with a NVIDIA GTX 1080 Ti GPU.
Software Dependencies Yes Our proposed method is implemented using Py Torch 1.0 [24].
Experiment Setup Yes According to the architecture, we empirically set the learning rates for the auto-encoders, generators and discriminators to 1e-3, 1e-4, 5e-5, respectively. Meanwhile, the trade-off coefficient λ1 is set to 1 and λ2 is set to 2 for the objective function. For the weight clipping, the clipping range is fixed at 0.05. (for Wikipedia) The learning rates for the auto-encoders, generators and discriminators are empirically set to 5e-4, 5e-5, 5e-5, respectively. λ1 and λ2 are both set to 1 to balance the loss. Moreover, the weight clipping range is fixed at 0.05 which is the same as in Wikipedia.