Large-scale Online Kernel Learning with Random Feature Reparameterization

Authors: Tu Dinh Nguyen, Trung Le, Hung Bui, Dinh Phung

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.
Researcher Affiliation Collaboration Tu Dinh Nguyen , Trung Le , Hung Bui , Dinh Phung Adobe Research, Adobe Systems, Inc. Center for Pattern Recognition and Data Analytics, Deakin University, Australia {tu.nguyen, trung.l, dinh.phung}@deakin.edu.au, hubui@adobe.com
Pseudocode Yes Algorithm 1 RRF for online learning.
Open Source Code No The paper does not explicitly state that source code for the described methodology is provided or link to a repository.
Open Datasets Yes These datasets can be downloaded from LIBSVM and UCI websites, except the airlines data which was obtained from American Statistical Association (ASA2). For the airlines dataset, our aim is to predict whether a flight will be delayed or not under binary classification setting, and how long (in minutes) the flight will be delayed in terms of departure time under regression setting. A flight is considered delayed if its delay time is above 15 minutes, and non-delayed otherwise. Following the procedure in [Hensman et al., 2013], we extract 8 features for flights in the year of 2008, and then normalize them into the range [0,1].
Dataset Splits No The paper mentions tuning hyperparameters using a "subset of data" (10% for medium-sized, 1% for large-scale) for grid search, but it does not specify explicit train/validation/test splits with precise percentages or sample counts, nor does it cite predefined splits or cross-validation in a way that fully satisfies the criteria for explicit dataset split information.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the use of LIBSVM, Budgeted SVM, and LSOKL toolboxes, but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For each method, we tune its regularization parameter λ or C, learning rate η, and RBF kernel width γ (our RRF can learn γ) using grid search on a subset of data. In particular, we randomly pick 10% of medium-sized datasets, but only 1% of large-scale datasets, so that the searching can finish within an acceptable time budget. The hyperparameters are varied in certain ranges and selected for the best performance (mistake rate) on these subsets. The ranges are given as follows: λ {2 4/M, 2 2/M, ..., 216/M}, γ {2 8, 2 4, 2 2, 20, 22, 24, 28}, and η {10 5, 3 10 5, 10 4, ..., 10 2} where M is the number of data points. The random feature dimension D of RRF is selected following the approach described in Section 4.2.