Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation
Authors: Louis Mahon, Thomas Lukasiewicz
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experimental Evaluation Datasets and Metrics We report results on CIFAR 10 (C10), CIFAR 100 (C100), Fashion MNIST (FMNIST), and STL, with image sizes 32, 32, 28, and 96, respectively, and the human activity recognition (HAR) dataset Real Disp, of 17 subjects performing 33 different activities wearing accelerometers. We use the standard clustering metrics of accuracy (ACC), normalized mutual information (NMI), and adjusted Rand index (ARI), defined as, e.g., in (Sheng and Huber 2020). We also report the KL-divergence from the ground truth of the model s empirical distribution over clusters, denoted KL *. |
| Researcher Affiliation | Academia | Louis Mahon1, 3, Thomas Lukasiewicz2, 3 1School of Informatics, University of Edinburgh, UK 2Institute of Logic and Computation, Vienna University of Technology, Austria 3Department of Computer Science, University of Oxford, UK |
| Pseudocode | Yes | Algorithm 1: During training, cluster labels are assigned batchwise, with partition support provided by a uniform prior across clusters. During inference, cluster labels are assigned pointwise, without any explicit partition support. |
| Open Source Code | Yes | Code is available at https://github.com/Lou1s M/online hard clustering. |
| Open Datasets | Yes | We report results on CIFAR 10 (C10), CIFAR 100 (C100), Fashion MNIST (FMNIST), and STL, with image sizes 32, 32, 28, and 96, respectively, and the human activity recognition (HAR) dataset Real Disp |
| Dataset Splits | No | The paper does not explicitly state training, validation, and test dataset splits with specific percentages, counts, or a detailed splitting methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Training uses Adam' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Training uses Adam, with learning rate 1e-3, β1 = 0.9, β2 = 0.99, and batch size 256. We follow previous works in setting K, the number of clusters, to the number of ground-truth classes. We set Σ = σI with σ = 1e2. |