reproducibilityindex.ai

Unsupervised Clustering using Pseudo-semi-supervised Learning

Authors: Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our approach outperforms state of the art clustering results for multiple image and text datasets. For example, we achieve 54.6% accuracy for CIFAR10 and 43.9% for 20news, outperforming state of the art by 8-12% in absolute terms. Project details and code are available at https://divamgupta.com/ pseudo-semi-supervised-clustering
Researcher Affiliation	Collaboration	Divam Gupta Carnegie Mellon University divam@cmu.edu Ramachandran Ramjee Microsoft Research India ramjee@microsoft.com Nipun Kwatra Microsoft Research India nipun.kwatra@microsoft.com Muthian Sivathanu Microsoft Research India muthian@microsoft.com
Pseudocode	Yes	Algorithm 1 Get high precision clusters using ensembles
Open Source Code	Yes	Project details and code are available at https://divamgupta.com/ pseudo-semi-supervised-clustering
Open Datasets	Yes	We evaluate Kingdra on three image datasets and two text datasets: MNIST is a dataset of 70000 handwritten digits of 28-by-28 pixel size. [...] CIFAR10 is a dataset of 32-by-32 color images with 10 classes having 6000 examples each. [...] STL is a dataset of 96-by-96 color images with 10 classes having 1300 examples each. [...] Reuters is a dataset containing English news stories with imbalanced data and four categories. [...] 20News is a dataset containing newsgroup documents with 20 different newsgroups.
Dataset Splits	No	The paper mentions using standard datasets and evaluating performance but does not specify explicit training, validation, and test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) that would be needed for reproducibility.
Hardware Specification	Yes	On a server with four P100 GPUs, CLadder-IM takes 2mins, CLadder-IM with ensemble takes 8mins and Kingdra with 10 iterations takes 80mins while IMSAT(RPT) takes 5mins.
Software Dependencies	No	The paper describes the use of models and techniques (e.g., 'Ladder networks', 'Resnet-50') and mentions other research, but it does not specify any version numbers for software libraries, frameworks (like TensorFlow or PyTorch), or programming languages used for implementation.
Experiment Setup	No	The paper states that the same data pre-processing and model layer sizes as a prior work (Hu et al. (2017)) were used, and it describes the proposed loss functions and model architectures. However, it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings necessary for reproducing the experimental setup.