MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering

Authors: Tsung Wei Tsai, Chongxuan Li, Jun Zhu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate the clustering performance of Mi CE on four widely adopted natural image datasets. Mi CE achieves significantly better results than various previous methods and a strong contrastive learning baseline. We present experimental results to demonstrate the effectiveness of Mi CE.
Researcher Affiliation Collaboration Tsung Wei Tsai, Chongxuan Li, Jun Zhu Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China {peter83112414,chongxuanli1991}@gmail.com, dcszj@mail.tsinghua.edu.cn
Pseudocode Yes Algorithm 1: Pseudocode of Mi CE in a Py Torch-like style
Open Source Code Yes Code is available at: https://github.com/TsungWeiTsai/MiCE
Open Datasets Yes We evaluate the clustering performance of Mi CE on four widely adopted natural image datasets, including STL-10 (Coates et al., 2011), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and Image Net-Dog (Chang et al., 2017).
Dataset Splits No As it is often infeasible to tune the hyper-parameters with a validation dataset in real-world clustering tasks (Ghasedi Dizaji et al., 2017), we set both temperatures τ and κ as 1.0.
Hardware Specification Yes For all four datasets, experiments are conducted on a single GPU (NVIDIA Ge Force GTX 1080 Ti).
Software Dependencies No The paper mentions implementing the pseudocode in a "Py Torch-like style" and using SGD as an optimizer, but it does not specify version numbers for PyTorch, Python, CUDA, or any other software dependencies.
Experiment Setup Yes We set both temperatures τ and κ as 1.0, and the batch size as 256. We set the SGD weight decay as 0.0001 and the SGD momentum as 0.9 (He et al., 2020). The learning rate is initiated as 1.0 and is multiplied by 0.1 at three different epochs. For CIFAR-10/100, we train for 1000 epochs in total and multiply the learning rate by 0.1 at 480, 640, and 800 epochs. For STL-10, the total epochs are 6000 and the learning rate is multiplied by 0.1 at 3000, 4000, and 5000 epochs. Lastly, for Image Net-Dog, the total epochs are 3000 and the learning rate is multiplied by 0.1 at 1500, 2000, and 2500 epochs.