Adaptive Accelerated Gradient Converging Method under H\"{o}lderian Error Bound Condition
Authors: Mingrui Liu, Tianbao Yang
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Results We conduct some experiments to demonstrate the effectiveness of ada AGC for solving problems of type (1). Specifically, we compare ada AGC, PG with option II that returns the solution with historically minimal proximal gradient, FISTA, unconditional restarting FISTA (ur FISTA) [6] for optimizing the squared hinge loss (classification), square loss (regression), huber loss (with ρ = 1) (regression) with ℓ1 and ℓ regularization, which are cases of (11), and we also consider the ℓ1 constrained ℓp norm regression (7) with varying p. We use three datasets from the Lib SVM website [5], which are splice (n = 1000, d = 60) for classification, and bodyfat Table 2: squared hinge loss with ℓ1 norm (left) and ℓ norm (right) regularization on splice data (n = 252, d = 14), cpusmall (n = 8192, d = 12) for regression. |
| Researcher Affiliation | Academia | Mingrui Liu, Tianbao Yang Department of Computer Science The University of Iowa, Iowa City, IA 52242 mingrui-liu, tianbao-yang@uiowa.edu |
| Pseudocode | Yes | Algorithm 1: ADG |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We use three datasets from the Lib SVM website [5], which are splice (n = 1000, d = 60) for classification, and bodyfat (n = 252, d = 14), cpusmall (n = 8192, d = 12) for regression. |
| Dataset Splits | No | The paper mentions training (e.g., 'training examples'), but it does not specify exact dataset split percentages (e.g., '80/10/10 split'), absolute sample counts, or refer to predefined splits with citations for training, validation, or test sets. No cross-validation setup is mentioned either. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU or GPU models, processor types, or memory used for running the experiments. It only mentions the datasets used. |
| Software Dependencies | No | The paper does not list specific software components with their version numbers (e.g., Python, PyTorch, CUDA, or specific solvers like CPLEX) that are needed to replicate the experiments. |
| Experiment Setup | Yes | For problems covered by (11), we fix λ = 1/n, and the parameter s in (7) is set to s = 100. We use the backtracking in PG, ada AGC and FISTA to search for the smoothness parameter. In ada AGC, we set c0 = 2, γ = 2 for the ℓ1 constrained ℓp norm regression and c0 = 10, γ = 2 for the rest problems. For fairness, for ur FISTA and ada AGC, we use the same initial estimate of unknown parameter (i.e., c). Each algorithm starts at the same initial point, which is set to be zero, and we stop each algorithm when the norm of its proximal gradient is less than a prescribed threshold ϵ and report the total number of proximal mappings. |