Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Co-clustering through Optimal Transport

Authors: Charlotte Laclau, Ievgen Redko, Basarab Matei, Younès Bennani, Vincent Brault

ICML 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we evaluate our approach on synthetic and real-world data sets and show that it is accurate and substantially more efﬁcient than the other state-of-the-art methods.
Researcher Affiliation	Academia	1CNRS, LIPN, Universit e Paris 13 Sorbonne Paris Cit e, France 2CNRS UMR 5220 INSERM U1206, Univ. Lyon 1, INSA Lyon, F-69621 Villeurbanne, France 3CNRS, LJK, Univ. Grenoble-Alpes, France.
Pseudocode	Yes	The pseudocode of both approaches in Matlab are presented in Algorithm 1 and Algorithm 2, respectively.
Open Source Code	No	The paper does not provide concrete access to source code for the described methodology. No links or explicit statements of code release were found.
Open Datasets	Yes	MOVIELENS-100K2 is a popular benchmark data set that consists of user-movie ratings, on a scale of one to ﬁve, collected from a movie recommendation service gathering 100,000 ratings from 943 users on 1682 movies. 2https://grouplens.org/datasets/movielens/100k/
Dataset Splits	No	The paper mentions 'cross-validation' for setting regularization parameters for MovieLens, but does not specify exact split percentages, sample counts, or citations to predefined splits for training, validation, and testing. For synthetic data, it mentions generating 100 datasets but not specific splits within each.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using Matlab for pseudocode but does not provide specific version numbers for any software, libraries, or solvers used in the experiments.
Experiment Setup	Yes	Regarding CCOT we set ns to 1000 for all conﬁgurations except D4 which has the same number of rows and columns, and therefore does not require any sampling. For CCOT-GW, we use Gaussian kernels for both rows and columns with σ computed as the mean of all pairwise Euclidean distances between vectors (Kar & Jain, 2011).