Surfing: Iterative Optimization Over Incrementally Trained Deep Networks
Authors: Ganlin Song, Zhou Fan, John Lafferty
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments to illustrate the performance of surfing over a sequence of networks during training compared with gradient descent over the final trained network. ... Table 1 shows the percentage of trials where the solutions bx T satisfy our criterion for successful recovery bx T x < 0.01, for different models and over three different input dimensions k. |
| Researcher Affiliation | Academia | Ganlin Song Department of Statistics and Data Science Yale University ganlin.song@yale.edu Zhou Fan Department of Statistics and Data Science Yale University zhou.fan@yale.edu John Lafferty Department of Statistics and Data Science Yale University john.lafferty@yale.edu |
| Pseudocode | Yes | Algorithm 1 Surfing ... Algorithm 2 Projected-gradient Surfing |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | We mainly use the Fashion-MNIST dataset to carry out the simulations, which is similar to MNIST in many characteristics, but is more difficult to train. |
| Dataset Splits | No | The paper does not provide specific details on train/validation/test splits for the Fashion-MNIST dataset, nor does it refer to standard splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam' and 'batch normalization' and generative models like 'VAE', 'DCGAN', 'WGAN', and 'WGAN-GP', but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | We run surfing by taking a sequence of parameters θ0, θ1, ..., θT for T = 100, where θ0 are the initial random parameters and the intermediate θt s are taken every 40 training steps, and we use Adam (Kingma and Ba, 2014) to carry out gradient descent in x over each network Gθt. ... The total number of iterations for networks Gθ0, . . . , GθT 1 is set as the 75th-percentile of the iteration count required for convergence of regular Adam. These are split across the networks proportional to a deterministic schedule that allots more steps to the earlier networks where the landscape of G(x) changes more rapidly, and fewer steps to later networks where this landscape stabilizes. |