Learning One Convolutional Layer with Overlapping Patches
Authors: Surbhi Goel, Adam Klivans, Raghu Meka
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments: SGD vs Convotron. To further support our theoretical findings, we empirically compare the performance of SGD (Algorithm 3) with our algorithm Convotron. We measure performance based on the failure probability, that is, the fraction of runs the algorithm fails to converge on randomly initialized runs (the randomness is over both the choice of initialization for SGD and the draws from the distribution). |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Texas at Austin 2Department of Computer Science, UCLA. |
| Pseudocode | Yes | Algorithm 1 Convotron; Algorithm 2 Convotron-No-Overlap; Algorithm 3 SGD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper mentions generating synthetic data ( |
| Dataset Splits | No | The paper does not provide specific dataset split information for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | In the experiments, given a fixed true weight vector, for varying learning rates (increments of 0.01), we choose 50 random initializations and run the two algorithms with them as starting points. We plot the failure probability (θ = 0.1) with varying learning rate. Note that the lowest learning rate we use is 0.01 as making the learning rate too small requires high number of iterations for convergence for both algorithms. We first test the performance on a simple 1D convolution case with (n, k, d, T) = (8, 4, 1, 6000) and 2D case with (n1, n2, k1, k2, d1, d2, T) = (5, 5, 3, 3, 1, 1, 15000) on inputs drawn from a normalized (l2 norm 1) Gaussian distribution with identity covariance matrix. |