Kernel QuantTree
Authors: Diego Stucchi, Paolo Rizzo, Nicolò Folloni, Giacomo Boracchi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present Kernel Quant Tree (KQT), a nonparametric change detection algorithm that monitors multivariate data through a histogram. KQT constructs a nonlinear partition of the input space that matches pre-defined target probabilities and specifically promotes compact bins adhering to the data distribution, resulting in a powerful detection algorithm. Our experiments show that KQT achieves superior detection power than non-parametric state-of-the-art change detection methods, and can reliably control the false positive rate. |
| Researcher Affiliation | Academia | 1Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy. |
| Pseudocode | Yes | Algorithm 1 Construction of the GQT histogram 1: Input: training set TR = {xi}N i=1 Rd, target probabilities {πk}K k=1 2: Output: GQT histogram h = {(Sk, bπk)}K k=1 3: Set X 0 = TR 4: for k = 1, . . . , K 1 do 5: Compute eπk = πk(1 P 6: Compute {fk(xi)} for xi X k 1 7: Set qk as the eπk-quantile of {fk(xi)} 8: Sk = {x T j<k Sj | fk(x) qk} 9: X k = {x X k 1 | fk(x) > qk} 10: end for 11: SK = Rd \ S |
| Open Source Code | Yes | Code available at github.com/diegocarrera89/quant Tree. |
| Open Datasets | Yes | The INSECTS dataset (Souza et al., 2020) contains feature vectors (d = 33) extracted from sensor measurements describing the wing-beat frequency of six (annotated) species of flying insects. We employ real-world datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and from (Dal Pozzolo et al., 2017), with dimensions ranging from d = 5 to d = 50, reported in Table 2. The Swarm Behavior classification dataset from the UCI Machine Learning Repository (Dua & Graff, 2017) comprises high-dimensional data (d = 2400) describing the motion of large groups of animals, which are labeled as flocking or not-flocking. |
| Dataset Splits | No | The paper describes setting detection thresholds via Monte Carlo simulations on synthetically generated data and mentions training on a set (TR), but it does not specify explicit train/validation/test splits of the main datasets used for evaluation to tune hyperparameters. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions various methods and models like 'Gaussian Mixture Model (GMM)', 'Pearson χ2 statistic', 'Welch s t-test', 'PCA', 'K-means clustering', 'Density Tree', but it does not specify version numbers for any of the software packages, libraries, or programming languages used. |
| Experiment Setup | Yes | We configure all the histogram-based methods to partition the space in K bins with uniform target probabilities πk = 1 K , as advised by (Boracchi et al., 2018). The number of bins K and the batch size ν must be chosen to guarantee that batches contain enough samples for a stable measure of these target probabilities. We confirm this by considering two settings: the high-ratio setting (K = 16, ν = 128) and the low-ratio one (K = 32, ν = 64). We set the detection thresholds in our experiments to yield an empirical FPR of α = 5%. In KQT, we use M = 4 components. |