Co-clustering through Optimal Transport

Authors: Charlotte Laclau, Ievgen Redko, Basarab Matei, Younès Bennani, Vincent Brault

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, we evaluate our approach on synthetic and real-world data sets and show that it is accurate and substantially more efficient than the other state-of-the-art methods.
Researcher Affiliation Academia 1CNRS, LIPN, Universit e Paris 13 Sorbonne Paris Cit e, France 2CNRS UMR 5220 INSERM U1206, Univ. Lyon 1, INSA Lyon, F-69621 Villeurbanne, France 3CNRS, LJK, Univ. Grenoble-Alpes, France.
Pseudocode Yes The pseudocode of both approaches in Matlab are presented in Algorithm 1 and Algorithm 2, respectively.
Open Source Code No The paper does not provide concrete access to source code for the described methodology. No links or explicit statements of code release were found.
Open Datasets Yes MOVIELENS-100K2 is a popular benchmark data set that consists of user-movie ratings, on a scale of one to five, collected from a movie recommendation service gathering 100,000 ratings from 943 users on 1682 movies. 2https://grouplens.org/datasets/movielens/100k/
Dataset Splits No The paper mentions 'cross-validation' for setting regularization parameters for MovieLens, but does not specify exact split percentages, sample counts, or citations to predefined splits for training, validation, and testing. For synthetic data, it mentions generating 100 datasets but not specific splits within each.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using Matlab for pseudocode but does not provide specific version numbers for any software, libraries, or solvers used in the experiments.
Experiment Setup Yes Regarding CCOT we set ns to 1000 for all configurations except D4 which has the same number of rows and columns, and therefore does not require any sampling. For CCOT-GW, we use Gaussian kernels for both rows and columns with σ computed as the mean of all pairwise Euclidean distances between vectors (Kar & Jain, 2011).