Learning A Structured Optimal Bipartite Graph for Co-Clustering

Authors: Feiping Nie, Xiaoqian Wang, Cheng Deng, Heng Huang

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical results are presented to verify the effectiveness and robustness of our model. We conduct several experiments to evaluate the effectiveness and robustness of our model. On both synthetic and benchmark datasets we gain equivalent or even better clustering results than other related methods. 5 Experimental Results
Researcher Affiliation Academia 1 School of Computer Science, Center for OPTIMAL, Northwestern Polytechnical University, China 2 Department of Electrical and Computer Engineering, University of Pittsburgh, USA 3 School of Electronic Engineering, Xidian University, China feipingnie@gmail.com,xqwang1991@gmail.com chdeng@mail.xidian.edu.cn,heng.huang@pitt.edu
Pseudocode Yes Algorithm 1 Algorithm to solve the problem (15). Algorithm 2 Algorithm to solve the problem (23).
Open Source Code No The paper does not provide any links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes Reuters21578 dataset is processed and downloaded from http://www.cad.zju.edu.cn/ home/dengcai/Data/Text Data.html. LUNG dataset [1]. Prostate-MS dataset [15]. Prostate Cancer PSA410 dataset [10].
Dataset Splits No The paper does not provide specific details about training, validation, or test dataset splits. It describes how synthetic data was generated and the properties of benchmark datasets, but not how they were partitioned for model training and evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using specific methods (e.g., K-means, NCut, NMF, BSGP, ONMTF) but does not provide specific software names with version numbers for libraries or environments used for implementation.
Experiment Setup Yes For methods requiring a similarity graph as the input, i.e., NCut and NMF, we adopted the self-tuning Gaussian method [19] to construct the graph, where the number of neighbors was set to be 5 and the σ value was self-tuned. When running K-means we used 100 random initializations for all these four methods and recorded the average performance over these 100 runs as well as the best one with respect to the K-means objective function value. In our method, to accelerate the algorithmic procedure, we determined the parameter λ in an heuristic way: first specify the value of λ with an initial guess; next, we computed the number of zero eigenvalues in LS in each iteration, if it was larger than k, then divided λ by 2; if smaller then multiplied λ by 2; otherwise we stopped the iteration. The number of clusters was set to be the ground truth. Before the clustering process, feature scaling was performed on each dataset such that features are on the same scale of [0, 1]. Also, the ℓ2-norm of each feature was normalized to 1.