Generalized Transitive Distance with Minimum Spanning Random Forest

Authors: Zhiding Yu, Weiyang Liu, Wenbo Liu, Xi Peng, Zhuo Hui, B. V. K. Vijaya Kumar

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on data clustering and image segmentation show that MSRF with max pooling improves the clustering performance over single MST and achieves state of the art performance on the Berkeley Segmentation Dataset.In this section, we conduct experiment on a set of very challenging toy examples to test the algorithm performance. We compare our results with several popular spectral clustering methods including spectral clustering[Ng et al., 2002], selftuning spectral clustering[Zelnik-Manor and Perona, 2004] and normalized cuts[Shi and Malik, 2000]. Figure 2 shows a set of toy example clustering results. Overall, We have carefully tuned the scale parameters for both spectral clustering and normalized cuts on each dataset. The number of trees is 3 for GTD (Seq. Kruskal), and 20 for GTD (Perturb.). The perturbation strength ϵ is set to 2.
Researcher Affiliation Academia Zhiding Yu1, Weiyang Liu2, Wenbo Liu1, Xi Peng3, Zhuo Hui1, B. V. K. Vijaya Kumar1 1Dept. of Electrical and Computer Eng., Carnegie Mellon University 2School of Electronic and Computer Eng., Peking University, P.R. China 3I2R, Agency for Sci., Tech. and Research (A*STAR), Singapore yzhiding@andrew.cmu.edu, wyliu@pku.edu.cn, pangsaai@gmail.com, kumar@ece.cmu.edu
Pseudocode Yes Algorithm 1 Extended Sequential Kruskal s Algorithm; Algorithm 2 Random Perturbation Algorithm
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology.
Open Datasets Yes The i-Vector dataset consists of 36572 600-dimensional preextracted i-vectors with 4958 identities. In addition to the i-Vector dataset we also form another large scale dataset (NIST) with the NIST SRE 2004, 2005, 2006 and 2008 corpora on the telephone channel. A total of 21704 500-dimensional i-vectors with 1738 identities were extracted under the framework of [Li and Narayanan, 2014]. We conduct image segmentation experiments on the BSDS300 dataset.
Dataset Splits No The paper mentions datasets but does not explicitly specify training, validation, and test splits with percentages, sample counts, or references to predefined splits for reproduction. It only states the total size of the i-Vector dataset.
Hardware Specification No The paper does not specify any hardware details like GPU/CPU models or memory used for running experiments.
Software Dependencies No The paper mentions using certain algorithms and frameworks (e.g., k-means, spectral clustering, structured random forest, Kruskal's algorithm) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The number of trees is 3 for GTD (Seq. Kruskal), and 20 for GTD (Perturb.). The perturbation strength ϵ is set to 2. T which is the number of trees are set to 30 for both the NIST dataset and the i-Vector dataset. The perturbation strength ϵ are respectively set to 0.015 and 0.03. We directly perform k-means on the matrix rows without SVD, and use mean shift to precluster the rows of the GTD matrix to roughly initialize the cluster centers. A lower bound of 2 and an upper bound of 12 is set on the final cluster number.