Learning SMaLL Predictors

Authors: Vikas Garg, Ofer Dekel, Lin Xiao

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We supplement the theoretical foundations of our work with an extensive empirical evaluation.
Researcher Affiliation Collaboration Vikas K. Garg CSAIL, MIT vgarg@csail.mit.edu Ofer Dekel Microsoft Research oferd@microsoft.com Lin Xiao Microsoft Research lin.xiao@microsoft.com
Pseudocode Yes Algorithm 1 Customized Mirror-Prox algorithm for solving the saddle-point problem (13); Algorithm 2 .Proj E/ Projection onto the set Ej j 2 Rn W ji 2 Œ0; 1 ; k j k1 k
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We experimented with Open ML data for two main reasons: (a) it contains many preprocessed binary datasets, and (b) the datasets come from diverse domains.
Dataset Splits Yes Since the datasets do not specify separate train, validation, and test sets, we measure test accuracy by averaging over five random train-test splits. ... We determined hyperparameters by 5-fold cross-validation.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments.
Software Dependencies No The paper mentions various algorithms (LSVM, RF, AB, LR, DT, k NN, RSVM, GB, GP, SMa LL, Proto NN, Bonsai) but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes We determined hyperparameters by 5-fold cross-validation. The coefficient of the error term C in LSVM and 2-regularized LR was selected from f0:1; 1; 10; 100g. In the case of RSVM, we also added 0:01 to the search set for C, and chose the best kernel between a radial basis function (RBF), polynomials of degree 2 and 3, and sigmoid. For the ensemble methods (RF, AB, GB), the number of base predictors was selected from the set f10; 20; 50g. The maximum number of features for RF estimators was optimized over the square root and the log selection criteria. We also found best validation parameters for DT (gini or entropy for attribute selection), k NN (1, 3, 5 or 7 neighbors), and GP (RBF kernel scaled with scaled by a coefficient in the set f0:1; 1:0; 5g and dot product kernel with inhomogeneity parameter set to 1). Finally, for our method SMa LL, we fixed D 0:1 and t D 0:01, and searched over ˇt D ˇ 2 f0:01; 0:001g.