Multi-View Information-Theoretic Co-Clustering for Co-Occurrence Data

Authors: Peng Xu, Zhaohong Deng, Kup-Sze Choi, Longbing Cao, Shitong Wang379-386

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on text and image multi-view datasets. The results clearly demonstrate the superiority of the proposed method.
Researcher Affiliation Academia School of Digital Media, Jiangnan University, China Center of Smart Health, Hong Kong Polytechnic University, Hong Kong Advanced Analytics Institute, University of Technology Sydney, Australia
Pseudocode Yes Algorithm 1: MV-ITCC Input: Given K views, the number of clusters C for n sam-ples, the convergence threshold , the number of iterations T and the multi-view dataset 1 { }n i i x , ( ) 1 { } v K i i v x x . Output: The final clustering function ( ) X i C x for sample ix . Procedure MV-ITCC: 1: Initialize the clustering function (0) X C , (0) Yi C for each view and initialize the weights (0) i w for each view. 2: Initialize the ( , ) i p X Y for each view based on 1 { }n i i x . 3: Initialize the (0) ˆ ( , ) i p X Y for each view based on ( , ) i p X Y , (0) X C , (0) Yi C and (5). 4: For 1,2,..., t T do 5: Update ( )t X C based on (14) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (10). 6: Update ( )t Yi C based on (15) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (11). 7: Update ( ) ˆ ( , ) t i p X Y based on ( , ) i p X Y , ( 1) t X C , ( 1) t Yi C and (5). 8: Update iw for each view based on (13). 9: Update ( )t J with (4) and evaluate the convergence by comparing with ( -1) t J . 10: end for
Open Source Code Yes The code is available at https://github.com/DallasBuyer/MVITCC
Open Datasets Yes Seven co-occurring datasets are used in the experiments to evaluate the effectiveness of the proposed method... Cora dataset: It is a dataset of publications (Zhang et al. 2014). Reuters dataset: Reuters is document collection translat-ed into five languages, where each language is regarded as a view (Jiang et al. 2012). 3S dataset: 3S (3Ssources) is a collection of stories gath-ered from three news websites (Zhang et al. 2014). NG20 dataset: NG20 is constructed from the News-Group 20 dataset according to the procedure in (Gu and Zhou 2009). Caltech dataset: Caltech is an image dataset containing 101 classes (Kumar and Rai 2011)... Corel dataset: Corel is an image classification dataset (Jiang et al. 2012)... Leaves dataset: It is an image dataset with one hundred plant species from UCI repository.
Dataset Splits No The paper mentions running algorithms for 30 times with different parameters to determine the best settings, implying some form of tuning, but it does not explicitly provide specific details on training, validation, and test dataset splits, such as percentages, sample counts, or explicit references to standard splits.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions) used for replicating the experiments.
Experiment Setup Yes MV-ITCC: For the proposed method, the regularization parameter of the maximum entropy term, the number of clusters for features and the number of iterations are all adjustable parameters... it was enough to set the number of iterations as 20. The regularization parameter was optimally set by using search grid 6 5 0 5 6 {2 ,2 ,...,2 ,...,2 ,2 } . Each algorithm was executed for 30 times with different parameters to determine the best settings where the optimal performance was achieved...