Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering
Authors: Ruimeng Liu, Xin Zou, Chang Tang, Xiao Zheng, Xingchen Hu, Kun Sun, Xinwang Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Sparse MVC achieves state-of-the-art clustering performance. Our framework advances the field by extending sparsity handling from the data-level to view-level and mitigating the adverse effects of encoding discrepancies through sample-level dynamic weighting. |
| Researcher Affiliation | Academia | 1School of Computer Science, China University of Geosciences (Wuhan) 2The Hong Kong University of Science and Technology (Guangzhou) 3School of Software Engineering, Huazhong University of Science and Technology 4School of Computer Science, Hubei University of Technology 5College of Systems Engineering, National University of Defense Technology |
| Pseudocode | Yes | A.1 Algorithm The training procedure for Sparse MVC is described in Algorithm (1). Algorithm 1 Training Steps for Sparse MVC Input: Multi-view data {Xv}V v=1, cluster number K, and number of training epochs Epre, Econ. Output: Late-stage fusion representation Y . 1: Initialize random seed and select Adam optimizer. 2: for epoch = 1 : Epre + Econ do 3: Update {Zv}V v=1 by minimizing {Lv recon}V v=1 and {Lv entropy}V v=1 utilizing Eqs. (2) and (4). 4: Update Z, formed by the concatenation of {Zv}V v=1, utilizing Eq. (2) and Eq. (4). 5: if epoch > Epre then 6: Update weights {Wv}V v=1 by Eq. (9). 7: Update Y by minimizing LCDA utilizing Eq. (13). 8: end if 9: end for 10: Perform K-means clustering on representation Y . |
| Open Source Code | Yes | The source code is publicly available at https://github.com/cleste-pome/Sparse MVC. |
| Open Datasets | Yes | Benchmark Datasets. The selected datasets span diverse domains: Image datasets include MSRCV1 [53] focusing on objects and scenes, Dermatology [54] on medical images, Out-Scene [55] on natural scenes, and ALOI-100 [56] on object recognition. Image-text datasets include Wikipedia 2, which provides website crossmodal data. Omics datasets include LGG [57] focusing on brain tumor genomics and BRCA [58] on breast cancer genomics. Synthetic3d [59] supports 3D object modeling and recognition. Detailed properties of datasets are listed in Table 2. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes training epochs and clustering methods applied to the datasets, but not how the datasets themselves were partitioned for evaluation. |
| Hardware Specification | Yes | All experiments were conducted using Python 3.8.15 and PyTorch 1.13.1+cu116 on a Windows PC equipped with an AMD Ryzen 9 5900HX CPU, 32GB RAM, and an Nvidia RTX 3080 GPU (16GB). |
| Software Dependencies | Yes | All experiments were conducted using Python 3.8.15 and PyTorch 1.13.1+cu116 on a Windows PC equipped with an AMD Ryzen 9 5900HX CPU, 32GB RAM, and an Nvidia RTX 3080 GPU (16GB). |
| Experiment Setup | Yes | Implementation Details All experiments were conducted using Python 3.8.15 and PyTorch 1.13.1+cu116 on a Windows PC equipped with an AMD Ryzen 9 5900HX CPU, 32GB RAM, and an Nvidia RTX 3080 GPU (16GB). Models were trained using the Adam optimizer [60], a learning rate of 0.003, and a fixed seed of 50, with batch size equal to the datasetโs sample count. Pre-training was performed uniformly for 300 epochs, while alignment training was conducted for 300 epochs for datasets with less than 2500 samples and 1000 epochs for larger datasets. For clustering, k-means [61] was applied with the number of clusters equal to the dataset categories and 100 initializations. During pre-training, global features Zv derived from early fusion were used, while alignment training used late fusion features Y . Metrics were calculated as the average of 10 runs in the final epoch, with no fine-tuning performed for specific datasets. To ensure fairness, the hyperparameters for the comparison methods were determined based on either the default global settings or the configuration of the first dataset. |