Guided Learning of Nonconvex Models through Successive Functional Gradient Optimization
Authors: Rie Johnson, Tong Zhang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a framework of successive functional gradient optimization for training nonconvex models such as neural networks, where training is driven by mirror descent in a function space. We provide a theoretical analysis and empirical study of the training method derived from this framework. |
| Researcher Affiliation | Collaboration | 1RJ Research Consulting, Tarrytown, New York, USA 2Hong Kong University of Science and Technology, Hong Kong. |
| Pseudocode | Yes | Algorithm 1 GULF in the most general form. Algorithm 2 GULF1 (h(u) = 1/2 u^2). Algorithm 3 GULF2 (h(p) = Ly(p)). Algorithm 4 base-loop (simplified SGDR). |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Table 1. Data. For each dataset, we randomly split the official training set into a training set and a development set to use the development set for meta-parameter tuning. For Image Net, following custom, we used the official validation set as our test set. |
| Dataset Splits | Yes | Table 1. Data. For each dataset, we randomly split the official training set into a training set and a development set to use the development set for meta-parameter tuning. |
| Hardware Specification | No | The paper mentions that 'Image Net training is resource-consuming' and discusses using 'models pre-trained on Image Net', implying the use of computational resources. However, it does not specify any particular hardware components such as GPU or CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Torch Vision' for pre-trained models and optimizers like 'Adam' and 'Rmsprop', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The default value of α is 0.3. We fixed mini-batch size to 128 and used the same learning rate decay schedule for all but Image Net. T for GULF2 and base-loop was fixed to 25 on CIFAR10/100 and 15 on SVHN. |