Dual Space Gradient Descent for Online Learning

Authors: Trung Le, Tu Nguyen, Vu Nguyen, Dinh Phung

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.In this section, we conduct comprehensive experiments to quantitatively evaluate the performance of our proposed Dual Space Gradient Descent (Dual SGD) on binary classification, multiclass classification and regression tasks under online settings.
Researcher Affiliation Academia Trung Le, Tu Dinh Nguyen, Vu Nguyen, Dinh Phung Centre for Pattern Recognition and Data Analytics Deakin University, Australia
Pseudocode Yes Algorithm 1: The learning of Dual Space Gradient Descent.Algorithm 2: k-merging Budget Maintenance Procedure.
Open Source Code No The paper mentions that baseline implementations are 'published as a part of LIBSVM, Budgeted SVM and LSOKL toolboxes.' However, there is no explicit statement or link providing the source code for their proposed Dual SGD method.
Open Datasets Yes We use 5 datasets which are ijcnn1, cod-rna, poker, year, and airlines. The datasets where purposely are selected with various sizes in order to clearly expose the differences among scalable capabilities of the models. ... These datasets can be downloaded from LIBSVM1 and UCI2 websites, except the airlines which was obtained from American Statistical Association (ASA3).
Dataset Splits Yes For each dataset, we perform 10 runs on each algorithm with different random permutations of the training data samples. In each run, the model is trained in a single pass through the data. Its prediction result and time spent are then reported by taking the average together with the standard deviation over all runs. For comparison, we employ 11 state-of-the-art online kernel learning methods... Hyperparameters setting. There are a number of different hyperparameters for all methods. Each method requires a different set of hyperparameters, e.g., the regularization parameters (λ in Dual SGD), the learning rates (η in FOGD and NOGD), and the RBF kernel width (γ in all methods). Thus, for a fair comparison, these hyperparameters are specified using cross-validation on a subset of data. In particular, we further partition the training set into 80% for learning and 20% for validation.
Hardware Specification Yes We use a Windows machine with 3.46GHz Xeon processor and 96GB RAM to conduct our experiments.
Software Dependencies No The paper mentions using 'LIBSVM', 'Budgeted SVM', and 'LSOKL' toolboxes but does not specify any version numbers for these or any other software dependencies.
Experiment Setup Yes Hyperparameters setting. There are a number of different hyperparameters for all methods. Each method requires a different set of hyperparameters, e.g., the regularization parameters (λ in Dual SGD), the learning rates (η in FOGD and NOGD), and the RBF kernel width (γ in all methods). ...The ranges are given as follows: C {2 5, 2 3, ..., 215}, λ {2 4/N, 2 2/N, ..., 216/N}, γ {2 8, 2 4, 2 2, 20, 22, 24, 28}, and η {2 4, 2 3, ..., 2 1, 21, 22..., 24} where N is the number of data points. The budget size B, merging size k and random feature dimension D of Dual SGD are selected following the approach described in Section 3.2. For a good trade-off between classification performance and computational cost, we select B = 100 and D = 200 which achieves fairly comparable classification result and running time.