reproducibilityindex.ai

Maximum Margin Dirichlet Process Mixtures for Clustering

Authors: Gang Chen, Haiying Zhang, Caiming Xiong

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our model and show comparative results over the traditional DPM and other nonparametric clustering approaches. We test our model on both synthetic and real datasets, and show comparative results over DPM and other nonparametric clustering methods. In this section, we conduct empirical studies on both synthetic and real datasets to evaluate the performance of our method. We also compare the computational cost between our model and baselines when we vary the number of data samples and dimensionality.
Researcher Affiliation	Collaboration	1Computer Science and Engineering, SUNY at Buffalo, Buffalo, NY 14260, gangchen@buffalo.edu 2State Key Laboratory of Remote Sensing Science, RADI, Chinese Academy of Sciences, Beijing 100101 3Meta Mind Inc., 172 University Avenue, Palo Alto, CA 94301, cmxiong@metamind.io
Pseudocode	Yes	Algorithm 1 Maximum margin Dirichlet process model
Open Source Code	No	The paper does not provide any specific links to source code or explicitly state that the code is publicly released for the methodology described.
Open Datasets	Yes	Dataset: The synthetic datasets are composed of 3 toy datasets (available on line1): Jain’s toy dataset (Jain 2007), Aggregation (Gionis, Mannila, and Tsaparas 2007) and Frame dataset (Fu and Medico 2007). For the real datasets, we test our method on Iris, Wine, Glass and Wdbc datasets, which are available from the UCI Machine Learning Data Repository2. We also test our method on MNIST digits3, 20 newsgroup dataset4 and the Reuters data set. (1http://cs.joensuu.fi/sipu/datasets/ 2http://www.ics.uci.edu/~mlearn/MLRepository.html 3http://yann.lecun.com/exdb/mnist/ 4http://people.csail.mit.edu/jrennie/20Newsgroups 5http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html)
Dataset Splits	No	The paper mentions using 2000 examples from 60000 training images for MNIST and 10000 examples for 20 Newsgroups, and also discusses
Hardware Specification	Yes	We implemented our algorithm with Matlab, and all experiments were conducted on Intel(R) Core(TM) i7-3770K CPU running at 3.50GHz with 32 GB of RAM.
Software Dependencies	No	The paper mentions that the algorithm was implemented in 'Matlab' but does not specify a version number or any other software dependencies with their versions.
Experiment Setup	Yes	In our MMDPM setting, we initialize λ = 3 in the conditional model in Eq. (9) if it is not specified, and C = 0.01 in the passive aggressive updating algorithm in Eq. (14). As for the number of iterations, we set T = 100. The initial number of components was set to 1 and the concentration parameter α was set to 4 in all experiments.