Interactive Deep Clustering via Value Mining
Authors: Honglin Liu, Peng Hu, Changqing Zhang, Yunfan Li, Xi Peng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that IDC could remarkably improve the performance of various pre-trained clustering models, at the expense of low user interaction costs. In this section, we first apply the proposed IDC to two state-of-the-art deep clustering methods, and evaluate the performance on five widely used image clustering benchmarks. Then we conduct ablation studies and parameter analyses to validate the robustness and effectiveness of IDC. |
| Researcher Affiliation | Academia | Honglin Liu1, Peng Hu1, Changqing Zhang2,3, Yunfan Li1 , Xi Peng1,4 1College of Computer Science, Sichuan University, Chengdu, China 2College of Intelligence and Computing, Tianjin University, Tianjin, China 3Tianjin Key Lab of Machine Learning, Tianjin, China 4State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China |
| Pseudocode | Yes | Algorithm 1 Valuable Sample Selection |
| Open Source Code | Yes | The code could be accessed at pengxi.me. |
| Open Datasets | Yes | We evaluate IDC on five widely used image clustering datasets, including CIFAR-10 [17], CIFAR-20 [17], STL-10 [8], Image Net-10 [5] and Image Net-Dogs [5], as detailed in Table 1. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for a validation set. Table 1 only specifies 'Train+Test' or 'Train' splits. |
| Hardware Specification | Yes | All experiments are conducted on a single Nvidia RTX 3090 GPU on the Ubuntu 20.04 platform with CUDA 12.0. |
| Software Dependencies | Yes | All experiments are conducted on a single Nvidia RTX 3090 GPU on the Ubuntu 20.04 platform with CUDA 12.0. |
| Experiment Setup | Yes | For user interaction, we select the top M = 500 valuable samples with the highest vi scores in our experiments. For each selected sample, we provide T = 5 nearest cluster center candidates. In the model optimization stage, we finetune the pre-trained clustering model for 100 epochs. To balance the effect of user feedback and model regularization, we use two independent data loaders for the inquiry and confident samples, with batch sizes of 100 and 500, respectively. τ = 0.99 is the confidence threshold. |