Kernel QuantTree

Authors: Diego Stucchi, Paolo Rizzo, Nicolò Folloni, Giacomo Boracchi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Kernel Quant Tree (KQT), a nonparametric change detection algorithm that monitors multivariate data through a histogram. KQT constructs a nonlinear partition of the input space that matches pre-defined target probabilities and specifically promotes compact bins adhering to the data distribution, resulting in a powerful detection algorithm. Our experiments show that KQT achieves superior detection power than non-parametric state-of-the-art change detection methods, and can reliably control the false positive rate.
Researcher Affiliation Academia 1Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy.
Pseudocode Yes Algorithm 1 Construction of the GQT histogram 1: Input: training set TR = {xi}N i=1 Rd, target probabilities {πk}K k=1 2: Output: GQT histogram h = {(Sk, bπk)}K k=1 3: Set X 0 = TR 4: for k = 1, . . . , K 1 do 5: Compute eπk = πk(1 P 6: Compute {fk(xi)} for xi X k 1 7: Set qk as the eπk-quantile of {fk(xi)} 8: Sk = {x T j<k Sj | fk(x) qk} 9: X k = {x X k 1 | fk(x) > qk} 10: end for 11: SK = Rd \ S
Open Source Code Yes Code available at github.com/diegocarrera89/quant Tree.
Open Datasets Yes The INSECTS dataset (Souza et al., 2020) contains feature vectors (d = 33) extracted from sensor measurements describing the wing-beat frequency of six (annotated) species of flying insects. We employ real-world datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and from (Dal Pozzolo et al., 2017), with dimensions ranging from d = 5 to d = 50, reported in Table 2. The Swarm Behavior classification dataset from the UCI Machine Learning Repository (Dua & Graff, 2017) comprises high-dimensional data (d = 2400) describing the motion of large groups of animals, which are labeled as flocking or not-flocking.
Dataset Splits No The paper describes setting detection thresholds via Monte Carlo simulations on synthetically generated data and mentions training on a set (TR), but it does not specify explicit train/validation/test splits of the main datasets used for evaluation to tune hyperparameters.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions various methods and models like 'Gaussian Mixture Model (GMM)', 'Pearson χ2 statistic', 'Welch s t-test', 'PCA', 'K-means clustering', 'Density Tree', but it does not specify version numbers for any of the software packages, libraries, or programming languages used.
Experiment Setup Yes We configure all the histogram-based methods to partition the space in K bins with uniform target probabilities πk = 1 K , as advised by (Boracchi et al., 2018). The number of bins K and the batch size ν must be chosen to guarantee that batches contain enough samples for a stable measure of these target probabilities. We confirm this by considering two settings: the high-ratio setting (K = 16, ν = 128) and the low-ratio one (K = 32, ν = 64). We set the detection thresholds in our experiments to yield an empirical FPR of α = 5%. In KQT, we use M = 4 components.