DM2C: Deep Mixed-Modal Clustering
Authors: Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework. In this section, we provide the empirical evaluation on two real-world mixed-modal dataset, Wikipedia and NUS-WIDE-10K. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Information Security, Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS 4School of Computer Science and Tech., University of Chinese Academy of Sciences 5Key Laboratory of Big Data Mining and Knowledge Management, CAS 6Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1 Deep mixed-modal clustering algorithm |
| Open Source Code | No | The paper does not provide explicit access (e.g., specific repository link, explicit code release statement) to source code for the methodology described. |
| Open Datasets | Yes | The Wikipedia dataset2 [25] contains 2,866 image-text pairs selected from the Wikipedia s featured articles collection. The NUS-WIDE-10K dataset3 [10] consists of 10,000 image-text pairs evenly selected from the 10 largest semantic categories of NUS-WIDE dataset [8]. |
| Dataset Splits | No | The paper specifies training and test set splits, but does not explicitly provide details for a validation set split. |
| Hardware Specification | Yes | All the experiments are performed on Ubuntu 16.04 with a NVIDIA GTX 1080 Ti GPU. |
| Software Dependencies | Yes | Our proposed method is implemented using Py Torch 1.0 [24]. |
| Experiment Setup | Yes | According to the architecture, we empirically set the learning rates for the auto-encoders, generators and discriminators to 1e-3, 1e-4, 5e-5, respectively. Meanwhile, the trade-off coefficient λ1 is set to 1 and λ2 is set to 2 for the objective function. For the weight clipping, the clipping range is fixed at 0.05. (for Wikipedia) The learning rates for the auto-encoders, generators and discriminators are empirically set to 5e-4, 5e-5, 5e-5, respectively. λ1 and λ2 are both set to 1 to balance the loss. Moreover, the weight clipping range is fixed at 0.05 which is the same as in Wikipedia. |