reproducibilityindex.ai

Energy-based Hopfield Boosting for Out-of-Distribution Detection

Authors: Claus Hofmann, Simon Schmid, Bernhard Lehner, Daniel Klotz, Sepp Hochreiter

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 from 2.28 to 0.92 on CIFAR-10, from 11.76 to 7.94 on CIFAR-100, and from 50.74 to 36.60 on Image Net-1K.
Researcher Affiliation	Collaboration	1 Institute for Machine Learning, JKU LIT SAL IWS Lab, Johannes Kepler University, Linz, Austria 2 Software Competence Center Hagenberg Gmb H, Austria 3 Silicon Austria Labs, JKU LIT SAL IWS Lab, Linz, Austria 4 Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research UFZ, Leipzig, Germany
Pseudocode	Yes	Algorithm 1 provides an outline of Hopfield Boosting.
Open Source Code	Yes	We provide the code to reproduce the experimental results of Hopfield Boosting in the submission. All data sets used are publicly available. Descriptions how to run the code and how to obtain the data sets come with the code.
Open Datasets	Yes	We train Hopfield Boosting with Res Net-18 (He et al., 2016) on the CIFAR-10 and CIFAR-100 data sets (Krizhevsky, 2009), respectively. In these settings, we use Image Net-RC (Chrabaszcz et al., 2017) (a low-resolution version of Image Net) as the AUX data set. For testing the OOD detection performance, we use the data sets SVHN (Street View House Numbers) (Netzer et al., 2011), Textures (Cimpoi et al., 2014), i SUN (Xu et al., 2015), Places 365 (López-Cifuentes et al., 2020), and two versions of the LSUN data set (Yu et al., 2015)... We use Image Net-1K (Russakovsky et al., 2015) as ID data set and Image Net-21K (Ridnik et al., 2021) as AUX data set.
Dataset Splits	Yes	It is common to choose the threshold γ so that a portion of 95% of ID samples from a previously unseen validation set are correctly classified as ID. However, metrics like the area under the receiver operating characteristic (AUROC) can be directly computed on s(ξ) without specifying γ since the AUROC computation sweeps over the threshold. We use a validation process with different OOD data for model selection.
Hardware Specification	Yes	Our experiments were conducted on an internal cluster equipped with a variety of different GPU types (ranging from the NVIDIA Titan V to the NVIDIA A100-SXM-80GB). For our experiments on Image Net-1K, we additionally used resources of an external cluster that is equipped with NVIDIA A100-SXM-64GB GPUs. For our experiments with Hopfield Boosting on CIFAR-10 and CIFAR-100, one run (100 epochs) of Hopfield Boosting trained for about 8.0 hours on a single NVIDIA RTX 2080 Ti GPU and required 4.3 GB of VRAM.
Software Dependencies	No	The paper mentions "Torch Vision, 2016" but does not specify software versions for programming languages, libraries, or other dependencies required to replicate the experiments (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup	Yes	The network trains for 100 epochs (CIFAR-10/100) or 4 epochs (Image Net-1K), respectively. In each epoch, the model processes the entire ID data set and a selection of AUX samples (sampled according to wt). We sample mini-batches of size 128 per data set, resulting in a combined batch size of 256. We evaluate the composite loss from Equation (11) for each resulting mini-batch and update the model accordingly. After an epoch, we update the sample weights, yielding wt+1. Like Yang et al. (2022), we use SGD with an initial learning rate of 0.1 and a weight decay of 5 10 4. We decrease the learning rate during the training process with a cosine schedule (Loshchilov & Hutter, 2016). For training Hopfield Boosting, we use a single value for β throughout the training and evaluation process and for all OOD data sets. For model selection, we use a grid search with λ, chosen from the set {0.1, 0.25, 0.5, 1.0}, and β, chosen from the set {2, 4, 8, 16, 32}. In our experiments, β = 4 and λ = 0.5 yields the best results for CIFAR-10 and CIFAR-100. For Image Net-1K, we set β = 32 and λ = 0.25.