reproducibilityindex.ai

Large-scale Online Kernel Learning with Random Feature Reparameterization

Authors: Tu Dinh Nguyen, Trung Le, Hung Bui, Dinh Phung

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efﬁcacy and efﬁciency.
Researcher Affiliation	Collaboration	Tu Dinh Nguyen , Trung Le , Hung Bui , Dinh Phung Adobe Research, Adobe Systems, Inc. Center for Pattern Recognition and Data Analytics, Deakin University, Australia {tu.nguyen, trung.l, dinh.phung}@deakin.edu.au, hubui@adobe.com
Pseudocode	Yes	Algorithm 1 RRF for online learning.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is provided or link to a repository.
Open Datasets	Yes	These datasets can be downloaded from LIBSVM and UCI websites, except the airlines data which was obtained from American Statistical Association (ASA2). For the airlines dataset, our aim is to predict whether a ﬂight will be delayed or not under binary classiﬁcation setting, and how long (in minutes) the ﬂight will be delayed in terms of departure time under regression setting. A ﬂight is considered delayed if its delay time is above 15 minutes, and non-delayed otherwise. Following the procedure in [Hensman et al., 2013], we extract 8 features for ﬂights in the year of 2008, and then normalize them into the range [0,1].
Dataset Splits	No	The paper mentions tuning hyperparameters using a "subset of data" (10% for medium-sized, 1% for large-scale) for grid search, but it does not specify explicit train/validation/test splits with precise percentages or sample counts, nor does it cite predefined splits or cross-validation in a way that fully satisfies the criteria for explicit dataset split information.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions the use of LIBSVM, Budgeted SVM, and LSOKL toolboxes, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For each method, we tune its regularization parameter λ or C, learning rate η, and RBF kernel width γ (our RRF can learn γ) using grid search on a subset of data. In particular, we randomly pick 10% of medium-sized datasets, but only 1% of large-scale datasets, so that the searching can ﬁnish within an acceptable time budget. The hyperparameters are varied in certain ranges and selected for the best performance (mistake rate) on these subsets. The ranges are given as follows: λ {2 4/M, 2 2/M, ..., 216/M}, γ {2 8, 2 4, 2 2, 20, 22, 24, 28}, and η {10 5, 3 10 5, 10 4, ..., 10 2} where M is the number of data points. The random feature dimension D of RRF is selected following the approach described in Section 4.2.