Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization

Authors: Rie Johnson, Tong Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper presents a framework of successive functional gradient optimization for training nonconvex models such as neural networks, where training is driven by mirror descent in a function space. We provide a theoretical analysis and empirical study of the training method derived from this framework.
Researcher Affiliation Collaboration 1RJ Research Consulting, Tarrytown, New York, USA 2Hong Kong University of Science and Technology, Hong Kong.
Pseudocode Yes Algorithm 1 GULF in the most general form. Algorithm 2 GULF1 (h(u) = 1/2 u^2). Algorithm 3 GULF2 (h(p) = Ly(p)). Algorithm 4 base-loop (simplified SGDR).
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Table 1. Data. For each dataset, we randomly split the official training set into a training set and a development set to use the development set for meta-parameter tuning. For Image Net, following custom, we used the official validation set as our test set.
Dataset Splits Yes Table 1. Data. For each dataset, we randomly split the official training set into a training set and a development set to use the development set for meta-parameter tuning.
Hardware Specification No The paper mentions that 'Image Net training is resource-consuming' and discusses using 'models pre-trained on Image Net', implying the use of computational resources. However, it does not specify any particular hardware components such as GPU or CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions using 'Torch Vision' for pre-trained models and optimizers like 'Adam' and 'Rmsprop', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The default value of α is 0.3. We fixed mini-batch size to 128 and used the same learning rate decay schedule for all but Image Net. T for GULF2 and base-loop was fixed to 25 on CIFAR10/100 and 15 on SVHN.