Maximum Margin Dirichlet Process Mixtures for Clustering

Authors: Gang Chen, Haiying Zhang, Caiming Xiong

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our model and show comparative results over the traditional DPM and other nonparametric clustering approaches. We test our model on both synthetic and real datasets, and show comparative results over DPM and other nonparametric clustering methods. In this section, we conduct empirical studies on both synthetic and real datasets to evaluate the performance of our method. We also compare the computational cost between our model and baselines when we vary the number of data samples and dimensionality.
Researcher Affiliation Collaboration 1Computer Science and Engineering, SUNY at Buffalo, Buffalo, NY 14260, gangchen@buffalo.edu 2State Key Laboratory of Remote Sensing Science, RADI, Chinese Academy of Sciences, Beijing 100101 3Meta Mind Inc., 172 University Avenue, Palo Alto, CA 94301, cmxiong@metamind.io
Pseudocode Yes Algorithm 1 Maximum margin Dirichlet process model
Open Source Code No The paper does not provide any specific links to source code or explicitly state that the code is publicly released for the methodology described.
Open Datasets Yes Dataset: The synthetic datasets are composed of 3 toy datasets (available on line1): Jain’s toy dataset (Jain 2007), Aggregation (Gionis, Mannila, and Tsaparas 2007) and Frame dataset (Fu and Medico 2007). For the real datasets, we test our method on Iris, Wine, Glass and Wdbc datasets, which are available from the UCI Machine Learning Data Repository2. We also test our method on MNIST digits3, 20 newsgroup dataset4 and the Reuters data set. (1http://cs.joensuu.fi/sipu/datasets/ 2http://www.ics.uci.edu/~mlearn/MLRepository.html 3http://yann.lecun.com/exdb/mnist/ 4http://people.csail.mit.edu/jrennie/20Newsgroups 5http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html)
Dataset Splits No The paper mentions using 2000 examples from 60000 training images for MNIST and 10000 examples for 20 Newsgroups, and also discusses
Hardware Specification Yes We implemented our algorithm with Matlab, and all experiments were conducted on Intel(R) Core(TM) i7-3770K CPU running at 3.50GHz with 32 GB of RAM.
Software Dependencies No The paper mentions that the algorithm was implemented in 'Matlab' but does not specify a version number or any other software dependencies with their versions.
Experiment Setup Yes In our MMDPM setting, we initialize λ = 3 in the conditional model in Eq. (9) if it is not specified, and C = 0.01 in the passive aggressive updating algorithm in Eq. (14). As for the number of iterations, we set T = 100. The initial number of components was set to 1 and the concentration parameter α was set to 4 in all experiments.