Convex Deep Learning via Normalized Kernels
Authors: Özlem Aslan, Xinhua Zhang, Dale Schuurmans
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To investigate the potential of deep versus shallow convex training methods, and global versus local training methods, we implemented the approach outlined above for a three-layer model along with comparison methods. |
| Researcher Affiliation | Academia | Ozlem Aslan Dept of Computing Science University of Alberta, Canada ozlem@cs.ualberta.ca Xinhua Zhang Machine Learning Group NICTA and ANU xizhang@nicta.com.au Dale Schuurmans Dept of Computing Science University of Alberta, Canada dale@cs.ualberta.ca |
| Pseudocode | Yes | Algorithm 1: Conditional gradient algorithm to optimize f(M1, M2) for M1, M2 M. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Here we tried to replicate the results of [25] on similar data sets, USPS and COIL from [41], Letter from [42], MNIST, and CIFAR-100 from [43]. |
| Dataset Splits | Yes | a given set of data (X, Y) is divided into separate training and test sets, (XL, YL) and XU, where labels are only included for the training set. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | This loss can be naturally interpreted using the remark following Postulate 1. It encourages that the propensity of example j with respect to itself, Sjj, should be higher than its propensity with respect to other examples, Sij, by a margin that is defined through the normalized kernel M. However note this loss does not correspond to a linear transfer between layers, even in terms of the propensity matrix S or normalized output kernel M. As in all large margin methods, the initial loss (12) is a convex upper bound for an underlying discrete loss defined with respect to a step transfer. |