Optimal Margin Distribution Clustering

Authors: Teng Zhang, Zhi-Hua Zhou

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on UCI data sets show that ODMC is significantly better than compared methods, which verifies the superiority of optimal margin distribution learning. In this section, we empirically evaluate the proposed method on 24 UCI data sets. Table 1 summarizes the statistics of these data sets.
Researcher Affiliation Academia Teng Zhang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210023, China {zhangt, zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 Stochastic mirror descent for ODMC
Open Source Code No The paper does not provide any explicit statement about making the source code for their methodology publicly available, nor does it include links to a code repository.
Open Datasets Yes Extensive experiments on UCI data sets show that ODMC is significantly better than compared methods. In this section, we empirically evaluate the proposed method on 24 UCI data sets. Table 1 summarizes the statistics of these data sets.
Dataset Splits No The paper mentions parameter selection for the models (e.g., 'C or λ is selected from {1, 10, 100, 1000}', 'ν and θ are selected from [0.2, 0.4, 0.6, 0.8]') but does not explicitly describe the use of a validation dataset or specific data splitting methodology (like k-fold cross-validation or specific train/validation percentages) for this parameter selection.
Hardware Specification Yes All the experiments are performed with MATLAB 2017b on a machine with 8 2.60 GHz CPUs and 32GB main memory.
Software Dependencies Yes All the experiments are performed with MATLAB 2017b
Experiment Setup Yes For GMMC, Iter SVR, CPMMC, LG-MMC, ODMC, the parameters C or λ is selected from {1, 10, 100, 1000}. For ODMC, ν and θ are selected from [0.2, 0.4, 0.6, 0.8]. For all data sets, both the linear and Gaussian kernels are used. In particular, the width σ of Gaussian kernel is picked from {0.25 γ, 0.5 γ, γ, 2 γ, 4 γ}, where γ is the average distance between instances. The parameter of normalized cut is chosen from the same range of σ. The balance constraint is set in the same manner as in (Zhang, Tsang, and Kwok 2007), i.e., 0.03m for balanced data set and 0.3m for imbalanced data set.