Convex Deep Learning via Normalized Kernels

Authors: Özlem Aslan, Xinhua Zhang, Dale Schuurmans

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To investigate the potential of deep versus shallow convex training methods, and global versus local training methods, we implemented the approach outlined above for a three-layer model along with comparison methods.
Researcher Affiliation Academia Ozlem Aslan Dept of Computing Science University of Alberta, Canada ozlem@cs.ualberta.ca Xinhua Zhang Machine Learning Group NICTA and ANU xizhang@nicta.com.au Dale Schuurmans Dept of Computing Science University of Alberta, Canada dale@cs.ualberta.ca
Pseudocode Yes Algorithm 1: Conditional gradient algorithm to optimize f(M1, M2) for M1, M2 M.
Open Source Code No The paper does not contain any explicit statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository.
Open Datasets Yes Here we tried to replicate the results of [25] on similar data sets, USPS and COIL from [41], Letter from [42], MNIST, and CIFAR-100 from [43].
Dataset Splits Yes a given set of data (X, Y) is divided into separate training and test sets, (XL, YL) and XU, where labels are only included for the training set.
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes This loss can be naturally interpreted using the remark following Postulate 1. It encourages that the propensity of example j with respect to itself, Sjj, should be higher than its propensity with respect to other examples, Sij, by a margin that is defined through the normalized kernel M. However note this loss does not correspond to a linear transfer between layers, even in terms of the propensity matrix S or normalized output kernel M. As in all large margin methods, the initial loss (12) is a convex upper bound for an underlying discrete loss defined with respect to a step transfer.