Multi-View Information-Theoretic Co-Clustering for Co-Occurrence Data
Authors: Peng Xu, Zhaohong Deng, Kup-Sze Choi, Longbing Cao, Shitong Wang379-386
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on text and image multi-view datasets. The results clearly demonstrate the superiority of the proposed method. |
| Researcher Affiliation | Academia | School of Digital Media, Jiangnan University, China Center of Smart Health, Hong Kong Polytechnic University, Hong Kong Advanced Analytics Institute, University of Technology Sydney, Australia |
| Pseudocode | Yes | Algorithm 1: MV-ITCC Input: Given K views, the number of clusters C for n sam-ples, the convergence threshold , the number of iterations T and the multi-view dataset 1 { }n i i x , ( ) 1 { } v K i i v x x . Output: The final clustering function ( ) X i C x for sample ix . Procedure MV-ITCC: 1: Initialize the clustering function (0) X C , (0) Yi C for each view and initialize the weights (0) i w for each view. 2: Initialize the ( , ) i p X Y for each view based on 1 { }n i i x . 3: Initialize the (0) ˆ ( , ) i p X Y for each view based on ( , ) i p X Y , (0) X C , (0) Yi C and (5). 4: For 1,2,..., t T do 5: Update ( )t X C based on (14) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (10). 6: Update ( )t Yi C based on (15) with ( , ) i p X Y , ( 1) ˆ ( , ) t i p X Y in (11). 7: Update ( ) ˆ ( , ) t i p X Y based on ( , ) i p X Y , ( 1) t X C , ( 1) t Yi C and (5). 8: Update iw for each view based on (13). 9: Update ( )t J with (4) and evaluate the convergence by comparing with ( -1) t J . 10: end for |
| Open Source Code | Yes | The code is available at https://github.com/DallasBuyer/MVITCC |
| Open Datasets | Yes | Seven co-occurring datasets are used in the experiments to evaluate the effectiveness of the proposed method... Cora dataset: It is a dataset of publications (Zhang et al. 2014). Reuters dataset: Reuters is document collection translat-ed into five languages, where each language is regarded as a view (Jiang et al. 2012). 3S dataset: 3S (3Ssources) is a collection of stories gath-ered from three news websites (Zhang et al. 2014). NG20 dataset: NG20 is constructed from the News-Group 20 dataset according to the procedure in (Gu and Zhou 2009). Caltech dataset: Caltech is an image dataset containing 101 classes (Kumar and Rai 2011)... Corel dataset: Corel is an image classification dataset (Jiang et al. 2012)... Leaves dataset: It is an image dataset with one hundred plant species from UCI repository. |
| Dataset Splits | No | The paper mentions running algorithms for 30 times with different parameters to determine the best settings, implying some form of tuning, but it does not explicitly provide specific details on training, validation, and test dataset splits, such as percentages, sample counts, or explicit references to standard splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions) used for replicating the experiments. |
| Experiment Setup | Yes | MV-ITCC: For the proposed method, the regularization parameter of the maximum entropy term, the number of clusters for features and the number of iterations are all adjustable parameters... it was enough to set the number of iterations as 20. The regularization parameter was optimally set by using search grid 6 5 0 5 6 {2 ,2 ,...,2 ,...,2 ,2 } . Each algorithm was executed for 30 times with different parameters to determine the best settings where the optimal performance was achieved... |