Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-View Information-Theoretic Co-Clustering for Co-Occurrence Data

Authors: Peng Xu, Zhaohong Deng, Kup-Sze Choi, Longbing Cao, Shitong Wang379-386

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on text and image multi-view datasets. The results clearly demonstrate the superiority of the proposed method.
Researcher Affiliation	Academia	School of Digital Media, Jiangnan University, China Center of Smart Health, Hong Kong Polytechnic University, Hong Kong Advanced Analytics Institute, University of Technology Sydney, Australia
Pseudocode	Yes	Algorithm 1: MV-ITCC Input: Given K views, the number of clusters C for n sam-ples, the convergence threshold , the number of iterations T and the multi-view dataset 1 { }n i i x , ( ) 1 { } v K i i v x x . Output: The final clustering function ( ) X i C x for sample ix . Procedure MV-ITCC: 1: Initialize the clustering function (0) X C , (0) Yi C for each view and initialize the weights (0) i w for each view. 2: Initialize the ( , ) i p X Y for each view based on 1 { }n i i x . 3: Initialize the (0) ˆ ( , ) i p X Y for each view based on ( , ) i p X Y , (0) X C , (0) Yi C and (5). 4: For 1,2,..., t T do 5: Update ( )t X C based on (14) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (10). 6: Update ( )t Yi C based on (15) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (11). 7: Update ( ) ˆ ( , ) t i p X Y based on ( , ) i p X Y , ( 1) t X C , ( 1) t Yi C and (5). 8: Update iw for each view based on (13). 9: Update ( )t J with (4) and evaluate the convergence by comparing with ( -1) t J . 10: end for
Open Source Code	Yes	The code is available at https://github.com/DallasBuyer/MVITCC
Open Datasets	Yes	Seven co-occurring datasets are used in the experiments to evaluate the effectiveness of the proposed method... Cora dataset: It is a dataset of publications (Zhang et al. 2014). Reuters dataset: Reuters is document collection translat-ed into five languages, where each language is regarded as a view (Jiang et al. 2012). 3S dataset: 3S (3Ssources) is a collection of stories gath-ered from three news websites (Zhang et al. 2014). NG20 dataset: NG20 is constructed from the News-Group 20 dataset according to the procedure in (Gu and Zhou 2009). Caltech dataset: Caltech is an image dataset containing 101 classes (Kumar and Rai 2011)... Corel dataset: Corel is an image classification dataset (Jiang et al. 2012)... Leaves dataset: It is an image dataset with one hundred plant species from UCI repository.
Dataset Splits	No	The paper mentions running algorithms for 30 times with different parameters to determine the best settings, implying some form of tuning, but it does not explicitly provide specific details on training, validation, and test dataset splits, such as percentages, sample counts, or explicit references to standard splits.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions) used for replicating the experiments.
Experiment Setup	Yes	MV-ITCC: For the proposed method, the regularization parameter of the maximum entropy term, the number of clusters for features and the number of iterations are all adjustable parameters... it was enough to set the number of iterations as 20. The regularization parameter was optimally set by using search grid 6 5 0 5 6 {2 ,2 ,...,2 ,...,2 ,2 } . Each algorithm was executed for 30 times with different parameters to determine the best settings where the optimal performance was achieved...