reproducibilityindex.ai

DM2C: Deep Mixed-Modal Clustering

Authors: Yangbangyan Jiang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework. In this section, we provide the empirical evaluation on two real-world mixed-modal dataset, Wikipedia and NUS-WIDE-10K.
Researcher Affiliation	Academia	1State Key Laboratory of Information Security, Institute of Information Engineering, CAS 2School of Cyber Security, University of Chinese Academy of Sciences 3Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS 4School of Computer Science and Tech., University of Chinese Academy of Sciences 5Key Laboratory of Big Data Mining and Knowledge Management, CAS 6Peng Cheng Laboratory
Pseudocode	Yes	Algorithm 1 Deep mixed-modal clustering algorithm
Open Source Code	No	The paper does not provide explicit access (e.g., specific repository link, explicit code release statement) to source code for the methodology described.
Open Datasets	Yes	The Wikipedia dataset2 [25] contains 2,866 image-text pairs selected from the Wikipedia s featured articles collection. The NUS-WIDE-10K dataset3 [10] consists of 10,000 image-text pairs evenly selected from the 10 largest semantic categories of NUS-WIDE dataset [8].
Dataset Splits	No	The paper specifies training and test set splits, but does not explicitly provide details for a validation set split.
Hardware Specification	Yes	All the experiments are performed on Ubuntu 16.04 with a NVIDIA GTX 1080 Ti GPU.
Software Dependencies	Yes	Our proposed method is implemented using Py Torch 1.0 [24].
Experiment Setup	Yes	According to the architecture, we empirically set the learning rates for the auto-encoders, generators and discriminators to 1e-3, 1e-4, 5e-5, respectively. Meanwhile, the trade-off coefﬁcient λ1 is set to 1 and λ2 is set to 2 for the objective function. For the weight clipping, the clipping range is ﬁxed at 0.05. (for Wikipedia) The learning rates for the auto-encoders, generators and discriminators are empirically set to 5e-4, 5e-5, 5e-5, respectively. λ1 and λ2 are both set to 1 to balance the loss. Moreover, the weight clipping range is ﬁxed at 0.05 which is the same as in Wikipedia.