MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering
Authors: Tsung Wei Tsai, Chongxuan Li, Jun Zhu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate the clustering performance of Mi CE on four widely adopted natural image datasets. Mi CE achieves significantly better results than various previous methods and a strong contrastive learning baseline. We present experimental results to demonstrate the effectiveness of Mi CE. |
| Researcher Affiliation | Collaboration | Tsung Wei Tsai, Chongxuan Li, Jun Zhu Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China {peter83112414,chongxuanli1991}@gmail.com, dcszj@mail.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: Pseudocode of Mi CE in a Py Torch-like style |
| Open Source Code | Yes | Code is available at: https://github.com/TsungWeiTsai/MiCE |
| Open Datasets | Yes | We evaluate the clustering performance of Mi CE on four widely adopted natural image datasets, including STL-10 (Coates et al., 2011), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and Image Net-Dog (Chang et al., 2017). |
| Dataset Splits | No | As it is often infeasible to tune the hyper-parameters with a validation dataset in real-world clustering tasks (Ghasedi Dizaji et al., 2017), we set both temperatures τ and κ as 1.0. |
| Hardware Specification | Yes | For all four datasets, experiments are conducted on a single GPU (NVIDIA Ge Force GTX 1080 Ti). |
| Software Dependencies | No | The paper mentions implementing the pseudocode in a "Py Torch-like style" and using SGD as an optimizer, but it does not specify version numbers for PyTorch, Python, CUDA, or any other software dependencies. |
| Experiment Setup | Yes | We set both temperatures τ and κ as 1.0, and the batch size as 256. We set the SGD weight decay as 0.0001 and the SGD momentum as 0.9 (He et al., 2020). The learning rate is initiated as 1.0 and is multiplied by 0.1 at three different epochs. For CIFAR-10/100, we train for 1000 epochs in total and multiply the learning rate by 0.1 at 480, 640, and 800 epochs. For STL-10, the total epochs are 6000 and the learning rate is multiplied by 0.1 at 3000, 4000, and 5000 epochs. Lastly, for Image Net-Dog, the total epochs are 3000 and the learning rate is multiplied by 0.1 at 1500, 2000, and 2500 epochs. |