Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-supervise, Refine, Repeat: Improving Unsupervised Anomaly Detection
Authors: Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O Arik, Chen-Yu Lee, Tomas Pfister
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments across various datasets from different domains, including semantic AD (CIFAR-10 (Krizhevsky & Hinton, 2009), Dog-vs-Cat (Elson et al., 2007)), real-world manufacturing visual AD use case (MVTec (Bergmann et al., 2019)), and real-world tabular AD benchmarks (e.g., detecting medical or network anomalies). We evaluate models at different anomaly ratios of unlabeled training data and show that SRR significantly boosts performance. |
| Researcher Affiliation | Industry | EMAIL Google Cloud AI |
| Pseudocode | Yes | Algorithm 1 SRR: Self-supervise, Refine, Repeat. Input: Train data D = {xi}N i=1, Ensemble count (K), threshold (γ) Output: Refined data ( ˆD), trained OCC (f), feature extractor (g) |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We conduct extensive experiments across various datasets from different domains, including semantic AD (CIFAR-10 (Krizhevsky & Hinton, 2009), Dog-vs-Cat (Elson et al., 2007)), real-world manufacturing visual AD use case (MVTec (Bergmann et al., 2019)), and real-world tabular AD benchmarks (e.g., detecting medical or network anomalies). Following (Zong et al., 2018; Bergman & Hoshen, 2019), we test the performance of SRR on a variety of real-world tabular AD datasets, including network (KDDCup) and medical (Thyroid, Arrhythmia) AD from the UCI repository (Asuncion & Newman, 2007). |
| Dataset Splits | Yes | To construct the data splits, we utilize 50% of normal samples for training. In addition, we hold out some anomaly samples (amounting to 10% of the normal samples) from the data. This allows to simulate unsupervised settings with an anomaly ratio of up to 10% of entire training set. Rest of the data is used for testing. For MVTec, since there are no anomalous data available for training, we borrow 10% of the anomalies from the test set and swap them with normal samples in the training set. |
| Hardware Specification | Yes | Each experimental run is performed on a single V100 GPU. |
| Software Dependencies | No | The paper mentions using specific models (e.g., Res Net-18 architecture) and optimizers (Momentum SGD) but does not provide specific version numbers for software libraries or programming languages. |
| Experiment Setup | Yes | The same model and hyperparameter configurations are used for SRR with K = 5 classifiers in the ensemble. We set γ as twice the anomaly ratio of training data. For 0% anomaly ratio, we set γ as 0.5. Finally, a Gaussian Density Estimator (GDE) on learned representations is used as the OCC. Optimizer Momentum SGD (momentum= 0.9) Learning rate 0.001 Batch size 64 M L2 weight regularization 0.00003 Random projection dimension 32 |