reproducibilityindex.ai

Online Isolation Forest

Authors: Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer, Albert Bifet, Giacomo Boracchi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental validation on real-world datasets demonstrated that ONLINE-IFOREST is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, ONLINE-IFOREST consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.
Researcher Affiliation	Academia	1Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy 2Artificial Intelligence Institute, University of Waikato, Hamilton, New Zealand.
Pseudocode	Yes	Algorithm 1: ONLINE-IFOREST, Algorithm 2: ONLINE-ITREE learn point, Algorithm 3: ONLINE-ITREE forget point, Algorithm 4: ONLINE-ITREE point depth
Open Source Code	Yes	The code of our method is publicly available at https://github.com/ineve Loppili F/ Online-Isolation-Forest.
Open Datasets	Yes	We run our experiments on the eight largest datasets used in (Liu et al., 2008; 2012) (Http, Smtp (Yamanishi et al., 2004), Annthyroid, Forest Cover Type, Satellite, Shuttle (Asunction & Newman, 2007), Mammography and Mulcross (Rocke & Woodruff, 1996)), two datasets from Kaggle competitions (Donors and Fraud (Pang et al., 2019)), and the shingled version of NYC Taxicab dataset used in (Guha et al., 2016).
Dataset Splits	No	The paper mentions shuffling datasets and using them for testing, but it does not specify explicit percentages or sample counts for training, validation, or test splits. For example, it does not state
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the "Autorank (Herbold, 2020) library" but does not specify version numbers for Autorank or any other core software libraries (e.g., Python, PyTorch/TensorFlow) that would be needed for reproduction.
Experiment Setup	Yes	For comparison purposes, we set the number of trees τ = 32 for all the algorithms, and considered the number of random cuts in LODA equivalent to the number of trees. We set window size ω = 2048 for both o IFOR and asd IFOR, and used the default value ω = 250 for HST. We set the subsampling size used to build trees in asd IFOR to the default value ψ = 256, while the number of bins for each random projection in LODA to b = 100. The trees maximum depth δ depends on the subsamping size ψ in asd IFOR, on the window size ω and number η of points required to split histogram bins in o IFOR, while it is fixed to the default value δ = 15 in HST. The parameters configuration for all the algorithms is illustrated in Table 3.