Integrating Vision-Language Semantic Graphs in Multi-View Clustering
Authors: JunLong Ke, Zichen Wen, Yechenhao Yang, Chenhang Cui, Yazhou Ren, Xiaorong Pu, Lifang He
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce an effective unsupervised approach for creating semantic graphs from image multi-view datasets using pre-trained encoders. Our method addresses the inherent spatial noise and imbalance in these encoders by employing graph filters and a joint process that integrates both image node and edge features. Additionally, we demonstrate the application of our approach to multi-view image clustering on extensive datasets, notably the high-resolution MVImg Net, achieving an impressive 82% accuracy. 4 Experiments |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering,University of Electronic Science and Technology of China 2Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China 3Department of Computer Science and Engineering, Lehigh University 2021150901020@std.uestc.edu.cn, Zichen.Wen@outlook.com, {yechenhaoyang, chenhangcui}@gmail.com,{yazhou.ren, puxiaor}@uestc.edu.cn, lih319@lehigh.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links for open-source code for the described methodology. |
| Open Datasets | Yes | As shown in Table 1, we use the following four real-world multi-view datasets in our study. MNIST [Le Cun et al., 1998] is a widely used dataset of handwritten digits from 0 to 9. The Fashion dataset [Xiao et al., 2017] comprises images of various fashion items, including T-shirts, dresses, coats, etc. The COIL dataset [Nene et al., 1996] contains images of various objects, such as cups, ducks, and blocks, shown in different poses. MVImg Net [Yu et al., 2023] is a multi-view image dataset presented with a large scale, high accuracy, and large diversity... |
| Dataset Splits | No | The paper describes the datasets used but does not explicitly provide training/validation/test dataset splits. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We maintain the threshold hyperparameter ε at 0.05. Additionally, εth is defined as a dynamic threshold that is set such that the number of nouns meeting the criterion outlined in Eq. (3) is equal to the hyperparameter β (i.e. the total number of nouns selected equal to βNClass, where NClassrefers to the classes number of image dataset) for each cluster label i. k is the order of the filter. The sensitivity analysis for order is on Figure 3. |