Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling

Authors: Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of the proposal algorithms through experiments. We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79. We verify the clustering qualities of the proposal algorithms through synthetic data, which is created as follows. This data set consists of tensors A, which has n dimensions in each order, and includes k ground truth co-clusters for some integers k, n, and m.
Researcher Affiliation Academia Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi National Institute of Informatics, Japan, JST, ERATO, Kawarabayashi Large Graph Project, Japan {hatano, takuro, k keniti}@nii.ac.jp Shizuoka University, Japan maehara.takanori@shizuoka.ac.jp
Pseudocode Yes Algorithm 1 Hypergraph version of Karger and Stein s algorithm; Algorithm 2 Co-clustering algorithm
Open Source Code No The paper mentions downloading a Matlab implementation of *other* algorithms from a URL but does not provide a link or statement for the authors' own code.
Open Datasets Yes To confirm the quality of co-clusterings produced by the proposed algorithm, we applied it to a tensor generated from data of sharing bike system in New York City available at https://www.citibikenyc.com/system-data.
Dataset Splits No The paper describes the generation of synthetic data and the use of real-world datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for any of these datasets.
Hardware Specification Yes We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79.
Software Dependencies Yes We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79.
Experiment Setup Yes As for the threshold θ in Algorithm 2, we compute the minimum weight W of k-way cuts computed by iterating KS(G, w, k) 1000 times, and define θ as αW for some parameter α 1. The number of iterations l of the proposed algorithms set to 1000 and α = 1.00, and k is fixed to the number of co-clusters in the ground truth co-clustering. The runtime of the proposed algorithm is obtained by setting l = 1 and k = 3. The result described in Table 3 and Figure 3 is obtained by setting l = 1000 and k = 25. As a preprocessing, we omitted elements (j1, j2, j3) whose weight A(j1, j2, j3) is less than 50, that is we reduce the weight of the elements to 0.