Adaptive Negative Curvature Descent with Applications in Non-convex Optimization
Authors: Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report some experimental results to justify effectiveness of Ada NCD for both deterministic and stochastic non-convex optimization. We consider three problems, namely, the cubic regularization, regularized non-linear least-square, and one hidden-layer neural network (NN) problem. |
| Researcher Affiliation | Collaboration | Mingrui Liu , Zhe Li , Xiaoyu Wang , Jinfeng Yi , Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA Intellifusion JD AI Research mingrui-liu, tianbao-yang@uiowa.edu |
| Pseudocode | Yes | Algorithm 1 Ada NCDdet(x, α, δ, f(x)) |
| Open Source Code | No | No explicit statement or link providing access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | We use w1a data (n = 2477, d = 300) from the libsvm website [8], and set λ = 1. [...] We use 12, 665 examples from the MNIST dataset [11] that belong to two categories 0 and 1 as input data, where the input feature dimensionality is 784. |
| Dataset Splits | No | The paper mentions using datasets (w1a, MNIST) but does not provide specific details on how they were split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the Lanczos method and datasets from libsvm, but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For all experiments, we choose α = 1/2, i.e., ϵ2 = ϵ1. The parameters L1, L2 are tuned for the non-adaptive algorithm NCG, and the same values are used in other algorithms. The searching range for L1 and L2 are from 10^-5:1:5. The mini-batch size used in S-Ada NCG and Ada NCD-SCSG is set to 50 for cubic regularization and 128 for other two tasks. We use the Lanczos method for NCS. For non-adaptive algorithms, the number of iterations in each call of the Lanczos method is set to min(C log(d)/ ϵ2, d); and for adaptive algorithms, the number of iterations in each call of the Lanczos method is set to min(C log(d)/ p max(ϵ2, g(x) 1/2), d), where g(x) is either a full gradient or a mini-batch stochastic gradient. The value of C is set to L1. We set ϵ1 = 10^-2 for cubic regularization, and ϵ1 = 10^-4 for other two tasks. |