Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach

Authors: Tri Nguyen, Shahana Ibrahim, Xiao Fu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our method over a series of DCC tasks and observe that the proposed approach significantly improves the performance over existing paradigms, especially when annotation noise exists. Our finding shows the significance of identifiability in DCC, echoing observations made in similar semi-supervised/unsupervised problems, e.g., (Arora et al., 2013; Kumar et al., 2013; Anandkumar et al., 2014; Zhang et al., 2014). We also evaluate the algorithms using real data collected through the Amazon Mechanical Turk (AMT) platform. The code is published at github.com/ductri/Vol Max DCC. Datasets. We use STL-10 (Coates et al., 2011), Image Net10 (Chang et al., 2017a), and CIFAR-10 (Krizhevsky et al., 2009).
Researcher Affiliation Academia 1School of Electrical Engineering and Computer Science, Oregon State University, OR, USA.
Pseudocode No The paper describes the proposed method and algorithm implementation in text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes The code is published at github.com/ductri/Vol Max DCC.
Open Datasets Yes Datasets. We use STL-10 (Coates et al., 2011), Image Net10 (Chang et al., 2017a), and CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits Yes We use a validation set for the baselines whenever proper for parameter tuning and algorithm stopping. The sizes of the validation sets are Nvalid = 1000 for STL-10 and Image Net-10 and Nvalid = 5000 for CIFAR-10.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory specifications, or cloud resources) used for running the experiments.
Software Dependencies No The paper mentions using stochastic gradient descent and references a pre-training method, but it does not specify software dependencies like programming language versions, library versions (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup Yes In our implementation, we use stochastic gradient descent with a batch size of 128. We set the learning rate for B and θ to be 0.1 and 0.5, respectively. The initialization of θ is chosen randomly following uniform distributions with parameters depending on output dimension of each layer. To initialize B , we make the diagonal elements to be 1 and the other elements 1.