reproducibilityindex.ai

Surfing: Iterative Optimization Over Incrementally Trained Deep Networks

Authors: Ganlin Song, Zhou Fan, John Lafferty

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments to illustrate the performance of surﬁng over a sequence of networks during training compared with gradient descent over the ﬁnal trained network. ... Table 1 shows the percentage of trials where the solutions bx T satisfy our criterion for successful recovery bx T x < 0.01, for different models and over three different input dimensions k.
Researcher Affiliation	Academia	Ganlin Song Department of Statistics and Data Science Yale University ganlin.song@yale.edu Zhou Fan Department of Statistics and Data Science Yale University zhou.fan@yale.edu John Lafferty Department of Statistics and Data Science Yale University john.lafferty@yale.edu
Pseudocode	Yes	Algorithm 1 Surﬁng ... Algorithm 2 Projected-gradient Surﬁng
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	Yes	We mainly use the Fashion-MNIST dataset to carry out the simulations, which is similar to MNIST in many characteristics, but is more difﬁcult to train.
Dataset Splits	No	The paper does not provide specific details on train/validation/test splits for the Fashion-MNIST dataset, nor does it refer to standard splits for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Adam' and 'batch normalization' and generative models like 'VAE', 'DCGAN', 'WGAN', and 'WGAN-GP', but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	We run surﬁng by taking a sequence of parameters θ0, θ1, ..., θT for T = 100, where θ0 are the initial random parameters and the intermediate θt s are taken every 40 training steps, and we use Adam (Kingma and Ba, 2014) to carry out gradient descent in x over each network Gθt. ... The total number of iterations for networks Gθ0, . . . , GθT 1 is set as the 75th-percentile of the iteration count required for convergence of regular Adam. These are split across the networks proportional to a deterministic schedule that allots more steps to the earlier networks where the landscape of G(x) changes more rapidly, and fewer steps to later networks where this landscape stabilizes.