Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

Authors: Xiaojian Ding, Lin Zhao, Xian Li, Xiaoying Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that HSACC significantly outperforms state-of-the-art methods on five benchmark datasets. Ablation studies validate the effectiveness of the hierarchical alignment and dynamic weighting mechanisms, while parameter analysis confirms the model s robustness to hyperparameter variations.
Researcher Affiliation Academia Xiaojian Ding , Lin Zhao, Xian Li, Xiaoying Zhu School of Computer and Artificial Intelligence, Nanjing University of Finance and Economics, Nanjing, China
Pseudocode Yes Algorithm 1: Incomplete Multi-View Clustering via Hierarchical Semantic Alignment and Cooperative Completion
Open Source Code Yes The code is available at https://github.com/Xiaojian Ding/2025-Neur IPS-HSACC.
Open Datasets Yes Datasets To evaluate the effectiveness of the proposed method, we selected five representative datasets. Land Use_21 [34] contains 2,100 remote sensing images from 21 categories. Noisy MNIST [35] consists of noisy handwritten digit images, with the original images as View 1 and Gaussiannoised images as View 2. Caltech101-20 [36] contains 2,386 images from 20 categories using HOG and GIST features. Hdigit [37] contains 10,000 handwritten digit images from 10 categories. 100leaves [38] contains 1,600 samples from 100 categories.
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits, percentages, or absolute sample counts for reproducibility. It mentions using benchmark datasets and varying missing rates, but not how the data was partitioned for training and evaluation.
Hardware Specification Yes All experiments were conducted on an NVIDIA RTX 4070 GPU using Py Torch 2.3.1.
Software Dependencies Yes All experiments were conducted on an NVIDIA RTX 4070 GPU using Py Torch 2.3.1.
Experiment Setup Yes In our experiments, each dataset underwent E training epochs (e.g., E = 500), and the computation of the inference loss was introduced starting from the E1-th epoch (e.g., E1 = 100). The learning rate was set to 0.0001, and the batch size was 256.