reproducibilityindex.ai

Better generalization with less data using robust gradient descent

Authors: Matthew Holland, Kazushi Ikeda

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efﬁcient and reliable learning is possible without prior knowledge of the loss tails.4. Empirical analysis The chief goal of our numerical experiments is to elucidate the relationship between factors of the learning task (e.g., sample size, model dimension, underlying data distribution) and the behaviour of the robust gradient procedure proposed in Algorithm 1.
Researcher Affiliation	Academia	1Institute of Scientiﬁc and Industrial Research, Osaka University 2Division of Information Science, Nara Institute of Science and Technology.
Pseudocode	Yes	Algorithm 1 Robust gradient descent outline inputs: b w0, T > 0 for t = 0, 1, . . . , T 1 do D(t) {l ( b w(t); zi)}n i=1 {Update loss gradients.} bσ(t) RESCALE(D(t)) {Eqn. (4).} bθ(t) LOCATE(D(t), bσ(t)) {Eqns. (3), (5).} b w(t+1) b w(t) α(t) bθ(t) {Plug in to update.} end for return: b w(T )
Open Source Code	No	No explicit statement about providing source code for the methodology described in this paper or a link to a repository was found.
Open Datasets	Yes	We use three well-known data sets for benchmarking: the CIFAR-10 data set of tiny images (ten classes), the MNIST data set of handwritten digits (ten classes), and the protein homology dataset (two classes) made popular by its inclusion in the KDD Cup.
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts for train/validation/test, or citations to predefined splits) needed for reproduction was found.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments were found.
Software Dependencies	No	The paper mentions 'Sci Py scientiﬁc computation library' and 'Python time module' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For these ﬁrst tests, we run three procedures. First is ideal gradient descent, denoted oracle, which assumes the objective function R known. This corresponds to (1). Second, as a standard approximate procedure (2), we use ERM-GD, denoted erm and discussed at the start of section 2, which approximates the optimal procedure using the empirical risk. Against these two benchmarks, we compare our Algorithm 1, denoted rgd, as a robust alternative for (2).Settings: n = 500, d = 2, α(t) = 0.1 for all t.Settings: n = 500, α(t) = 0.1 for all t.we set T = 25 for all settings.we initialize RGD to the OLS solution, with conﬁdence δ = 0.005, and α(t) = 0.1 for all iterations. Maximum number of iterations is T 100; the routine ﬁnishes after hitting this maximum or when the absolute value of the gradient falls below 0.001 for all conditions.All learning algorithms are given a ﬁxed budget of gradient computations, set here to 20n, where n is the size of the training set made available to the learner.mini-batch sizes ranging over {5, 10, 15, 20}pre-ﬁxed step sizes ranging over {0.0001, 0.001, 0.01, 0.05, 0.10, 0.15, 0.20} are tested.