StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Authors: Tyler B. Johnson, Carlos Guestrin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented CD, CD with safe screening, Stingy CD, and Stingy CD+ to solve (PL). Coordinates are updated in round-robin fashion. We normalize columns of A and include an unregularized intercept term. We also remove features that have nonzero values in fewer than ten examples. For CD with safe screening, we apply the test from (Johnson & Guestrin, 2016), which is state-of-the-art to our knowledge. Following (Fercoq et al., 2015), screening is performed after every ten epochs. Performing screening requires a full pass over the data, which is non-negligible. We compare the algorithms using a financial document dataset (finance)1 and an insurance claim prediction task (allstate)2. finance contains 1.6 104 examples, 5.5 105 features, and 8.8 107 nonzero values. The result included in this section uses regularization λ = 0.05λmax, where λmax is the smallest λ value that results in an all-zero solution. The solution contains 1746 nonzero entries. The allstate data contains 2.5 105 examples, 1.5 104 features, and 1.2 108 nonzero values. For this problem, we set λ = 0.05λmax, resulting in 1404 selected features. We include results for additional λ values in Appendix G. Stingy CD seems to have slightly greater impact when λ is larger, but the results generally do not change much with λ. We evaluate the algorithms using three metrics. The first metric is relative suboptimality, defined as Relative suboptimality = f(x(t)) f(x ) The other metrics are support set precision and recall.
Researcher Affiliation Academia Tyler B. Johnson 1 Carlos Guestrin 1 1University of Washington, Seattle, WA.
Pseudocode Yes Algorithm 1 Coordinate descent for solving (P); Algorithm 2 Stingy CD for solving (P)
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets Yes We compare the algorithms using a financial document dataset (finance)1 and an insurance claim prediction task (allstate)2. 1https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/ datasets/regression.html#E2006-log1p; 2https://www.kaggle.com/c/allstate-claims-severity
Dataset Splits No The paper describes the datasets used and performance metrics related to optimization convergence (e.g., relative suboptimality, support set precision/recall), but it does not specify explicit train/validation/test dataset splits for model generalization, nor does it detail cross-validation setups.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes Coordinates are updated in round-robin fashion. We normalize columns of A and include an unregularized intercept term. We also remove features that have nonzero values in fewer than ten examples. For CD with safe screening, we apply the test from (Johnson & Guestrin, 2016), which is state-of-the-art to our knowledge. Following (Fercoq et al., 2015), screening is performed after every ten epochs. The result included in this section uses regularization λ = 0.05λmax, where λmax is the smallest λ value that results in an all-zero solution. In practice, we define ξ(t) = NNZ x(t 1) .