Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient and Effective Optimal Transport-Based Biclustering

Authors: Chakib Fettal, lazhar labiod, Mohamed NADIF

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ran experiments using term-document matrices. The benefit of using biclustering on this kind of data is that the resulting biclusters contain both documents and the words that characterize them, which is helpful in interpreting the clustering of the documents. Additional experiments over synthetic and gene expression data are available in the appendix.
Researcher Affiliation	Academia	Chakib Fettal Centre Borelli UMR 9010 Université Paris Cité Informatique Caisse des Dépôts et Consignations EMAIL Lazhar Labiod Centre Borelli UMR 9010 Université Paris Cité EMAIL Mohamed Nadif Centre Borelli UMR 9010 Université Paris Cité EMAIL
Pseudocode	Yes	Algorithm 1: BCOT Input :B bi-adjacency matrix, w and v row and column weights, r and c row and column exemplar distributions Output :πr, πc row and column partitions W Winit; while not converged do Z arg OT (L(B)W, w, r); W arg OT L(B) Z, v, c ; end Generate πr, πc from Z and W;
Open Source Code	Yes	For reproducibility, we publicly release our code 2. 2https://github.com/chakib401/BCOT
Open Datasets	Yes	We evaluate BCOT in relation to six benchmark document-term datasets: ACM, DBLP, Pub Med, Wiki, Ohscal, and 20 Newsgroups. Their characteristics are shown in Table 2. ACM [13], DBLP [13], Pubmed [32] and Wiki [37] are attributed networks from which we use only the node-level features that correspond to term-document matrices. We also selected the Ohscal collection [22] and 20 Newsgroups [26] as large-scale document-term matrices to serve as computational efficiency benchmarks.
Dataset Splits	No	The paper uses benchmark datasets but does not explicitly state the training, validation, or test split percentages or sample counts used for reproducing the experiments.
Hardware Specification	Yes	All the experiments were performed on the same machine with an Intel(R) Xeon(R) CPU and 12GB RAM.
Software Dependencies	No	For OT solvers we made use of the POT package [15].
Experiment Setup	Yes	In our experiments we define the loss function as L(B) = c B, where c is selected from {1, k, d, n}. For BCOTλ, the regularization parameter lambda is selected from {10 4, 10 3, 10 2, 10 1, 1, 10}. The best hyper-parameters are those that minimize the number of empty clusters. In the case of ties, we select according to the value of the Davies-Bouldin index of the partition [7]. Random restarts are not used for any of the algorithms, including k-means.