reproducibilityindex.ai

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Authors: Tyler B. Johnson, Carlos Guestrin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented CD, CD with safe screening, Stingy CD, and Stingy CD+ to solve (PL). Coordinates are updated in round-robin fashion. We normalize columns of A and include an unregularized intercept term. We also remove features that have nonzero values in fewer than ten examples. For CD with safe screening, we apply the test from (Johnson & Guestrin, 2016), which is state-of-the-art to our knowledge. Following (Fercoq et al., 2015), screening is performed after every ten epochs. Performing screening requires a full pass over the data, which is non-negligible. We compare the algorithms using a ﬁnancial document dataset (ﬁnance)1 and an insurance claim prediction task (allstate)2. ﬁnance contains 1.6 104 examples, 5.5 105 features, and 8.8 107 nonzero values. The result included in this section uses regularization λ = 0.05λmax, where λmax is the smallest λ value that results in an all-zero solution. The solution contains 1746 nonzero entries. The allstate data contains 2.5 105 examples, 1.5 104 features, and 1.2 108 nonzero values. For this problem, we set λ = 0.05λmax, resulting in 1404 selected features. We include results for additional λ values in Appendix G. Stingy CD seems to have slightly greater impact when λ is larger, but the results generally do not change much with λ. We evaluate the algorithms using three metrics. The ﬁrst metric is relative suboptimality, deﬁned as Relative suboptimality = f(x(t)) f(x ) The other metrics are support set precision and recall.
Researcher Affiliation	Academia	Tyler B. Johnson 1 Carlos Guestrin 1 1University of Washington, Seattle, WA.
Pseudocode	Yes	Algorithm 1 Coordinate descent for solving (P); Algorithm 2 Stingy CD for solving (P)
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	We compare the algorithms using a ﬁnancial document dataset (ﬁnance)1 and an insurance claim prediction task (allstate)2. 1https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/ datasets/regression.html#E2006-log1p; 2https://www.kaggle.com/c/allstate-claims-severity
Dataset Splits	No	The paper describes the datasets used and performance metrics related to optimization convergence (e.g., relative suboptimality, support set precision/recall), but it does not specify explicit train/validation/test dataset splits for model generalization, nor does it detail cross-validation setups.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Coordinates are updated in round-robin fashion. We normalize columns of A and include an unregularized intercept term. We also remove features that have nonzero values in fewer than ten examples. For CD with safe screening, we apply the test from (Johnson & Guestrin, 2016), which is state-of-the-art to our knowledge. Following (Fercoq et al., 2015), screening is performed after every ten epochs. The result included in this section uses regularization λ = 0.05λmax, where λmax is the smallest λ value that results in an all-zero solution. In practice, we deﬁne ξ(t) = NNZ x(t 1) .