Learning One Convolutional Layer with Overlapping Patches

Authors: Surbhi Goel, Adam Klivans, Raghu Meka

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments: SGD vs Convotron. To further support our theoretical findings, we empirically compare the performance of SGD (Algorithm 3) with our algorithm Convotron. We measure performance based on the failure probability, that is, the fraction of runs the algorithm fails to converge on randomly initialized runs (the randomness is over both the choice of initialization for SGD and the draws from the distribution).
Researcher Affiliation Academia 1Department of Computer Science, University of Texas at Austin 2Department of Computer Science, UCLA.
Pseudocode Yes Algorithm 1 Convotron; Algorithm 2 Convotron-No-Overlap; Algorithm 3 SGD
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper mentions generating synthetic data (
Dataset Splits No The paper does not provide specific dataset split information for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes In the experiments, given a fixed true weight vector, for varying learning rates (increments of 0.01), we choose 50 random initializations and run the two algorithms with them as starting points. We plot the failure probability (θ = 0.1) with varying learning rate. Note that the lowest learning rate we use is 0.01 as making the learning rate too small requires high number of iterations for convergence for both algorithms. We first test the performance on a simple 1D convolution case with (n, k, d, T) = (8, 4, 1, 6000) and 2D case with (n1, n2, k1, k2, d1, d2, T) = (5, 5, 3, 3, 1, 1, 15000) on inputs drawn from a normalized (l2 norm 1) Gaussian distribution with identity covariance matrix.