Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Authors: Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Alejandro Ribeiro

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset. In this section, we study the performance of Ada Newton and compare it with state-of-the-art in solving a large-scale classification problem.
Researcher Affiliation Academia Aryan Mokhtari? University of Pennsylvania, Hadi Daneshmand? ETH Zurich, Switzerland, Aurelien Lucchi ETH Zurich, Switzerland, Thomas Hofmann ETH Zurich, Switzerland, Alejandro Ribeiro University of Pennsylvania
Pseudocode Yes Algorithm 1 Ada Newton
Open Source Code No The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes In the main paper we only use the protein homology dataset provided on KDD cup 2004 website.
Dataset Splits No The paper mentions a 'training set' and 'test set' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages, sample counts, or predefined splits).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup Yes In our experiments, we use logistic loss and set the regularization parameters as c = 200 and Vn = 1/n. The stepsize of SGD in our experiments is 2 × 10−2. The stepsize for SAGA is hand-optimized and the best performance has been observed for η = 0.2 which is the one that we use in the experiments. For Newton’s method, the backtracking line search parameters are α = 0.4 and β = 0.5. In the implementation of Ada Newton we increase the size of the training set by factor 2 at each iteration, i.e., λ = 2... Moreover, the size of initial training set is m0 = 124. For the warmup step... we run gradient descent with stepsize 10−3 for 100 iterations.