Adaptive Negative Curvature Descent with Applications in Non-convex Optimization

Authors: Mingrui Liu, Zhe Li, Xiaoyu Wang, Jinfeng Yi, Tianbao Yang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report some experimental results to justify effectiveness of Ada NCD for both deterministic and stochastic non-convex optimization. We consider three problems, namely, the cubic regularization, regularized non-linear least-square, and one hidden-layer neural network (NN) problem.
Researcher Affiliation Collaboration Mingrui Liu , Zhe Li , Xiaoyu Wang , Jinfeng Yi , Tianbao Yang Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA Intellifusion JD AI Research mingrui-liu, tianbao-yang@uiowa.edu
Pseudocode Yes Algorithm 1 Ada NCDdet(x, α, δ, f(x))
Open Source Code No No explicit statement or link providing access to the source code for the methodology described in this paper was found.
Open Datasets Yes We use w1a data (n = 2477, d = 300) from the libsvm website [8], and set λ = 1. [...] We use 12, 665 examples from the MNIST dataset [11] that belong to two categories 0 and 1 as input data, where the input feature dimensionality is 784.
Dataset Splits No The paper mentions using datasets (w1a, MNIST) but does not provide specific details on how they were split into training, validation, or test sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using the Lanczos method and datasets from libsvm, but does not specify any software dependencies with version numbers.
Experiment Setup Yes For all experiments, we choose α = 1/2, i.e., ϵ2 = ϵ1. The parameters L1, L2 are tuned for the non-adaptive algorithm NCG, and the same values are used in other algorithms. The searching range for L1 and L2 are from 10^-5:1:5. The mini-batch size used in S-Ada NCG and Ada NCD-SCSG is set to 50 for cubic regularization and 128 for other two tasks. We use the Lanczos method for NCS. For non-adaptive algorithms, the number of iterations in each call of the Lanczos method is set to min(C log(d)/ ϵ2, d); and for adaptive algorithms, the number of iterations in each call of the Lanczos method is set to min(C log(d)/ p max(ϵ2, g(x) 1/2), d), where g(x) is either a full gradient or a mini-batch stochastic gradient. The value of C is set to L1. We set ϵ1 = 10^-2 for cubic regularization, and ϵ1 = 10^-4 for other two tasks.