reproducibilityindex.ai

Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization

Authors: Rie Johnson, Tong Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a framework of successive functional gradient optimization for training nonconvex models such as neural networks, where training is driven by mirror descent in a function space. We provide a theoretical analysis and empirical study of the training method derived from this framework.
Researcher Affiliation	Collaboration	1RJ Research Consulting, Tarrytown, New York, USA 2Hong Kong University of Science and Technology, Hong Kong.
Pseudocode	Yes	Algorithm 1 GULF in the most general form. Algorithm 2 GULF1 (h(u) = 1/2 u^2). Algorithm 3 GULF2 (h(p) = Ly(p)). Algorithm 4 base-loop (simpliﬁed SGDR).
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Table 1. Data. For each dataset, we randomly split the ofﬁcial training set into a training set and a development set to use the development set for meta-parameter tuning. For Image Net, following custom, we used the ofﬁcial validation set as our test set.
Dataset Splits	Yes	Table 1. Data. For each dataset, we randomly split the ofﬁcial training set into a training set and a development set to use the development set for meta-parameter tuning.
Hardware Specification	No	The paper mentions that 'Image Net training is resource-consuming' and discusses using 'models pre-trained on Image Net', implying the use of computational resources. However, it does not specify any particular hardware components such as GPU or CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper mentions using 'Torch Vision' for pre-trained models and optimizers like 'Adam' and 'Rmsprop', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The default value of α is 0.3. We ﬁxed mini-batch size to 128 and used the same learning rate decay schedule for all but Image Net. T for GULF2 and base-loop was ﬁxed to 25 on CIFAR10/100 and 15 on SVHN.