Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling

Authors: Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of the proposal algorithms through experiments. We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79. We verify the clustering qualities of the proposal algorithms through synthetic data, which is created as follows. This data set consists of tensors A, which has n dimensions in each order, and includes k ground truth co-clusters for some integers k, n, and m.
Researcher Affiliation Academia Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi National Institute of Informatics, Japan, JST, ERATO, Kawarabayashi Large Graph Project, Japan EMAIL Shizuoka University, Japan EMAIL
Pseudocode Yes Algorithm 1 Hypergraph version of Karger and Stein s algorithm; Algorithm 2 Co-clustering algorithm
Open Source Code No The paper mentions downloading a Matlab implementation of *other* algorithms from a URL but does not provide a link or statement for the authors' own code.
Open Datasets Yes To confirm the quality of co-clusterings produced by the proposed algorithm, we applied it to a tensor generated from data of sharing bike system in New York City available at https://www.citibikenyc.com/system-data.
Dataset Splits No The paper describes the generation of synthetic data and the use of real-world datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for any of these datasets.
Hardware Specification Yes We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79.
Software Dependencies Yes We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79.
Experiment Setup Yes As for the threshold θ in Algorithm 2, we compute the minimum weight W of k-way cuts computed by iterating KS(G, w, k) 1000 times, and define θ as αW for some parameter α 1. The number of iterations l of the proposed algorithms set to 1000 and α = 1.00, and k is fixed to the number of co-clusters in the ground truth co-clustering. The runtime of the proposed algorithm is obtained by setting l = 1 and k = 3. The result described in Table 3 and Figure 3 is obtained by setting l = 1000 and k = 25. As a preprocessing, we omitted elements (j1, j2, j3) whose weight A(j1, j2, j3) is less than 50, that is we reduce the weight of the elements to 0.