reproducibilityindex.ai

Interactive Deep Clustering via Value Mining

Authors: Honglin Liu, Peng Hu, Changqing Zhang, Yunfan Li, Xi Peng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that IDC could remarkably improve the performance of various pre-trained clustering models, at the expense of low user interaction costs. In this section, we first apply the proposed IDC to two state-of-the-art deep clustering methods, and evaluate the performance on five widely used image clustering benchmarks. Then we conduct ablation studies and parameter analyses to validate the robustness and effectiveness of IDC.
Researcher Affiliation	Academia	Honglin Liu1, Peng Hu1, Changqing Zhang2,3, Yunfan Li1 , Xi Peng1,4 1College of Computer Science, Sichuan University, Chengdu, China 2College of Intelligence and Computing, Tianjin University, Tianjin, China 3Tianjin Key Lab of Machine Learning, Tianjin, China 4State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, China
Pseudocode	Yes	Algorithm 1 Valuable Sample Selection
Open Source Code	Yes	The code could be accessed at pengxi.me.
Open Datasets	Yes	We evaluate IDC on five widely used image clustering datasets, including CIFAR-10 [17], CIFAR-20 [17], STL-10 [8], Image Net-10 [5] and Image Net-Dogs [5], as detailed in Table 1.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for a validation set. Table 1 only specifies 'Train+Test' or 'Train' splits.
Hardware Specification	Yes	All experiments are conducted on a single Nvidia RTX 3090 GPU on the Ubuntu 20.04 platform with CUDA 12.0.
Software Dependencies	Yes	All experiments are conducted on a single Nvidia RTX 3090 GPU on the Ubuntu 20.04 platform with CUDA 12.0.
Experiment Setup	Yes	For user interaction, we select the top M = 500 valuable samples with the highest vi scores in our experiments. For each selected sample, we provide T = 5 nearest cluster center candidates. In the model optimization stage, we finetune the pre-trained clustering model for 100 epochs. To balance the effect of user feedback and model regularization, we use two independent data loaders for the inquiry and confident samples, with batch sizes of 100 and 500, respectively. τ = 0.99 is the confidence threshold.