Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy
Authors: Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Alejandro Ribeiro
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset. In this section, we study the performance of Ada Newton and compare it with state-of-the-art in solving a large-scale classification problem. |
| Researcher Affiliation | Academia | Aryan Mokhtari? University of Pennsylvania, Hadi Daneshmand? ETH Zurich, Switzerland, Aurelien Lucchi ETH Zurich, Switzerland, Thomas Hofmann ETH Zurich, Switzerland, Alejandro Ribeiro University of Pennsylvania |
| Pseudocode | Yes | Algorithm 1 Ada Newton |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In the main paper we only use the protein homology dataset provided on KDD cup 2004 website. |
| Dataset Splits | No | The paper mentions a 'training set' and 'test set' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | In our experiments, we use logistic loss and set the regularization parameters as c = 200 and Vn = 1/n. The stepsize of SGD in our experiments is 2 × 10−2. The stepsize for SAGA is hand-optimized and the best performance has been observed for η = 0.2 which is the one that we use in the experiments. For Newton’s method, the backtracking line search parameters are α = 0.4 and β = 0.5. In the implementation of Ada Newton we increase the size of the training set by factor 2 at each iteration, i.e., λ = 2... Moreover, the size of initial training set is m0 = 124. For the warmup step... we run gradient descent with stepsize 10−3 for 100 iterations. |