Learning Infinite Layer Networks Without the Kernel Trick
Authors: Roi Livni, Daniel Carmon, Amir Globerson
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we provide a toy experiment to compare our Shrinking Gradient algorithm to other random feature based methods. In particular, we consider the following three algorithms: Fixed-Random: Sample a set of r features w1, . . . , wr and evaluate these on all the train and test points. Doubly Stochastic Gradient Descent (Dai et al., 2014): Here each training point x samples k features w1, . . . , wk. Shrinking Gradient: This is the approach proposed here in Section 3. The results shown in Figure 1 show that our method indeed achieves a lower loss while working with the same feature budget. |
| Researcher Affiliation | Academia | 1University of Princeton, Princeton, New Jersey, USA 2Tel-Aviv University, Tel-Aviv, Israel. |
| Pseudocode | Yes | Algorithm 1: The SHRINKING GRADIENT algorithm. Algorithm 2: EST SCALAR PROD |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology. |
| Open Datasets | No | The training set is generated as follows. First, a training set x1, . . . , x T RD is sampled from a standard Gaussian. We furthermore clip negative values to zero, in order to make the data sparser and more challenging for feature sampling. Next a weight vector a RD is chosen as a random sparse linear combination of the training points. This is done in order for the true function to be in the corresponding RKHS. Finally, the training set is labeled using yi = a xi. The paper describes how a synthetic dataset was generated but does not provide access information or a citation for an existing public dataset. |
| Dataset Splits | No | The paper mentions a 'training set' and 'test points' but does not explicitly specify a validation set or detailed splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper mentions 'explored different initial step sizes and schedules for changing the step size' for the experiments but does not provide specific values for these or other hyperparameters/training settings. |