reproducibilityindex.ai

Anytime Guarantees under Heavy-Tailed Data

Authors: Matthew J. Holland6918-6925

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we complement the preceding theoretical analysis with an application of the proposed learning strategy to real-world benchmark datasets. The practical utility of various gradient truncation mechanisms has already been well-studied in the literature (Chen, Su, and Xu 2017; Prasad et al. 2018; Lecu e, Lerasle, and Mathieu 2018; Holland and Ikeda 2019), and thus our chief point of interest here is if and when the feedback scheme utilized in Algorithm 1 can outperform the traditional feedback mechanism given by (2), under a convex, differentiable true objective. Put more succinctly, the key question is: is there a practical beneﬁt to querying at points with guarantees? Experimental setup Considering the context of key related work (Gorbunov, Danilova, and Gasnikov 2020; Nazin et al. 2019), we focus on averaged SGD as our baseline, and consider several real-world classiﬁcation datasets of varying size, using standard multi-class logistic regression as our model.2 Results and discussion Our results are summarized in Figure 1, which plots the average training and test losses.
Researcher Affiliation	Academia	Matthew J. Holland Osaka University
Pseudocode	Yes	Algorithm 1: Anytime robust online-to-batch conversion.
Open Source Code	Yes	A public repository including all experimental code has been published: https://github.com/feedbackward/anytime
Open Datasets	Yes	For CIFAR-10, we observe that the robustiﬁed version performs worse than than vanilla anytime averaged SGD; this looks to be due to the simple eh = h1 setting, and can be readily mitigated by updating eh after one pass over the data.
Dataset Splits	No	The paper specifies a training and test split ("the training set is of size ntr ..= 0.8n , and the test set is of size n ntr.") but does not explicitly mention a validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only mentions the implementation environment: "Everything is implemented by hand in Python (ver. 3.8), making signiﬁcant use of the numpy library (ver. 1.20)."
Software Dependencies	Yes	Everything is implemented by hand in Python (ver. 3.8), making signiﬁcant use of the numpy library (ver. 1.20).
Experiment Setup	Yes	For all methods, the step size in update (17) is ﬁxed at βt = 2/ ntr, for all steps t; this setting is appropriate for Anytime-* methods due to Corollary 7, and also for SGD-Ave based on standard results such as Nemirovski et al. (2009, Sec. 2.3). The (Gt) are obtained by direct computation of the logistic loss gradients, averaged over a mini-batch of size 8; this size was set arbitrarily for speed and stability, and no other minibatch values were tested. Furthermore, for each method and each trial, the initial value h1 is randomly generated in a dimension-wise fashion from the uniform distribution on the interval [ 0.05, 0.05]. All raw input features are normalized to the unit interval [0, 1] in a per-feature fashion. We do not do any regularization, for any method being tested. ... First, as a simple choice of anchors eh and eg, we set eh = h1 and estimate eg using the empirical mean on the training data set; ... As for the thresholds (ct) used in the Process sub-routine, we set ct = p ntr/ log(δ 1) for all t, with a conﬁdence level of δ = 0.05 ﬁxed throughout.