Online Random Feature Forests for Learning in Varying Feature Spaces

Authors: Christian Schreckenberger, Yi He, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark our algorithm on 12 datasets, including one novel real-world dataset of government COVID19 responses collected through a crowd-sensing program in Spain. The empirical results substantiate the viability and effectiveness of our ORF3V algorithm and its superior accuracy performance over the state-of-the-art rival models. Evaluation This section presents empirical evidence to substantiate the effectiveness of the ORF3V algorithm. We benchmark twelve datasets with two types of feature space dynamics, namely, trapezoidal data streams (TDS) and varying feature spaces (VFS).
Researcher Affiliation Academia 1 Chair for Artificial Intelligence, University of Mannheim, Germany 2 Institute for Enterprise Systems, University of Mannheim, Germany 3 Department of Computer Science, Old Dominion University, USA
Pseudocode Yes Algorithm 1: ORF3V 1: for t = 1, 2, ...T do 2: receive instance (xt, yt) 3: update feature stats Dfi,c according to (xt, yt) 4: check for Li L to be pruned 5: for all new feature fi Ft do 6: generate Li based on Dfi with sufficient instances 7: end for 8: for all feature fi xt do 9: update weights for Li L corresponding to fi 10: end for 11: if t mod r == 0 then 12: update Li L according to Dfi 13: end for 14: end for
Open Source Code No The paper does not provide an explicit statement about open-source code release or a link to a code repository for the ORF3V method.
Open Datasets Yes To validate the applicability of our algorithm, we select ten datasets from the UCI data repository1 spanning a wide range of domains. Two real-world datasets, IMDB (Maas et al. 2011) and crowdsense, gathered from the crowdsensing platform Smart Citizen (Camprodon et al. 2019), which naturally manifest a varying feature space are employed in our evaluation. The ground truth labels can be retrieved from the Oxford COVID-19 Government Response Tracker (Hale et al. 2021), as the local governments responses influence crowded regions.
Dataset Splits No The paper states 'On each dataset, the instances are presented to the learning algorithms in a one-pass fashion.' and 'where random shuffling has repeated 10 times for cross validation' but does not provide explicit training, validation, and test dataset split percentages or absolute sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper describes algorithmic details like 'grace period', 'update strategy s', and the use of 'Hoeffding bound' and 't-digest algorithm'. It also mentions 'a weight wi in the ensemble, which is initialized with a value of one' and 'wi = 2α [[yt = ˆyt]] + wi'. However, it does not explicitly provide concrete hyperparameter values or training configuration settings (e.g., specific values for J, α, or the sliding window size n) used in the experiments.