Weakly Supervised Clustering by Exploiting Unique Class Count
Authors: Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have constructed a neural network based ucc classifier and experimentally shown that the clustering performance of our framework with our weakly supervised ucc classifier is comparable to that of fully supervised learning models where labels for all instances are known. Furthermore, we have tested the applicability of our framework to a real world task of semantic segmentation of breast cancer metastases in histological lymph node sections and shown that the performance of our weakly supervised framework is comparable to the performance of a fully supervised Unet model. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore, Singapore 117417, 2A*STAR Bioinformatics Institute, Singapore 138671, 3Image and Pervasive Access Lab (IPAL), CNRS UMI 2955, Singapore 138632, 4Singapore Eye Research Institute, Singapore 169856, 5A*STAR Genome Institute of Singapore, Singapore 138672 |
| Pseudocode | No | The paper describes the model architecture and training process using textual descriptions and mathematical formulas, but it does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code and trained models: http://bit.ly/uniqueclasscount |
| Open Datasets | Yes | This section analyzes the performances of our UCC models and fully supervised models in terms of our eventual objective of unsupervised instance clustering on MNIST (10 clusters) (Le Cun et al., 1998), CIFAR10 (10 clusters) and CIFAR100 (20 clusters) datasets (Krizhevsky & Hinton, 2009). ... We have used 512 512 image crops from publicly available CAMELYON dataset (Litjens et al., 2018) |
| Dataset Splits | Yes | For MNIST, we randomly splitted 10,000 images from training set as validation set, so we had 50,000, 10,000 and 10,000 images in our training Xmnist,tr, validation Xmnist,val and test sets Xmnist,test, respectively. ... Similar to MNIST dataset, we randomly splitted 10,000 images from the training set as validation set. Hence, we had 40,000, 10,000 and 10,000 images in our training Xcifar10,tr, validation Xcifar10,val and testing Xcifar10,test sets for CIFAR10, respectively. ... Similar to other datasets, we randomly splitted 10,000 images from the training set as validation set. Hence, we had 40,000, 10,000 and 10,000 images in our training Xcifar100,tr, validation Xcifar100,val and testing Xcifar100,test sets for CIFAR10, respectively. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper describes the use of neural networks and deep learning models, implying the use of associated software frameworks (e.g., TensorFlow, PyTorch). However, it does not explicitly list any software dependencies with their specific version numbers (e.g., 'Python 3.7', 'PyTorch 1.9'). |
| Experiment Setup | Yes | For KDE module, we have tried parameters of 11 bins, 21 bins, σ = 0.1 and σ = 0.01. Best results were obtained with 11 bins and σ = 0.1. Similarly, we have tested different number of features at the output of θfeature module and we decided to use 10 features for MNIST and CIFAR10 datasets and 16 features for CIFAR100 dataset based on the clustering performance and computation burden. |