Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling
Authors: Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of the proposal algorithms through experiments. We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79. We verify the clustering qualities of the proposal algorithms through synthetic data, which is created as follows. This data set consists of tensors A, which has n dimensions in each order, and includes k ground truth co-clusters for some integers k, n, and m. |
| Researcher Affiliation | Academia | Daisuke Hatano, Takuro Fukunaga, Takanori Maehara, Ken-ichi Kawarabayashi National Institute of Informatics, Japan, JST, ERATO, Kawarabayashi Large Graph Project, Japan EMAIL Shizuoka University, Japan EMAIL |
| Pseudocode | Yes | Algorithm 1 Hypergraph version of Karger and Stein s algorithm; Algorithm 2 Co-clustering algorithm |
| Open Source Code | No | The paper mentions downloading a Matlab implementation of *other* algorithms from a URL but does not provide a link or statement for the authors' own code. |
| Open Datasets | Yes | To confirm the quality of co-clusterings produced by the proposed algorithm, we applied it to a tensor generated from data of sharing bike system in New York City available at https://www.citibikenyc.com/system-data. |
| Dataset Splits | No | The paper describes the generation of synthetic data and the use of real-world datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for any of these datasets. |
| Hardware Specification | Yes | We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79. |
| Software Dependencies | Yes | We conducted experiments on an Ubuntu server with an Intel Xeon E5-2690, 2.9GHz processor and 512GB memory, and implemented our algorithm in Java 1.7.0 79. |
| Experiment Setup | Yes | As for the threshold θ in Algorithm 2, we compute the minimum weight W of k-way cuts computed by iterating KS(G, w, k) 1000 times, and define θ as αW for some parameter α 1. The number of iterations l of the proposed algorithms set to 1000 and α = 1.00, and k is fixed to the number of co-clusters in the ground truth co-clustering. The runtime of the proposed algorithm is obtained by setting l = 1 and k = 3. The result described in Table 3 and Figure 3 is obtained by setting l = 1000 and k = 25. As a preprocessing, we omitted elements (j1, j2, j3) whose weight A(j1, j2, j3) is less than 50, that is we reduce the weight of the elements to 0. |