Deep Learning as a Mixed Convex-Combinatorial Optimization Problem

Authors: Abram L. Friesen, Pedro Domingos

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our algorithm improves classification accuracy in a number of settings, including for Alex Net and Res Net-18 on Image Net, when compared to the straight-through estimator.
Researcher Affiliation Academia Abram L. Friesen and Pedro Domingos Paul G. Allen School of Computer Science and Engineering University of Washington Seattle, WA 98195, USA {afriesen,pedrod}@cs.washington.edu
Pseudocode Yes Algorithm 1 Train an ℓ-layer hard-threshold network Y = f(X; W) on dataset D = (X, Tℓ) with feasible target propagation (FTPROP) using loss functions L = {Ld}ℓ d=1.
Open Source Code Yes Code for the experiments is available at https://github.com/afriesen/ftprop.
Open Datasets Yes We tested these training methods on the CIFAR-10 (Krizhevsky, 2009) and Image Net (ILSVRC 2012) (Russakovsky et al., 2015) datasets.
Dataset Splits Yes On CIFAR-10, which has 50K training images and 10K test images divided into 10 classes... On Image Net, a much more challenging dataset with roughly 1.2M training images and 50K validation images divided into 1000 classes.
Hardware Specification Yes All experiments were performed using Py Torch (http://pytorch.org/). CIFAR-10 experiments with the 4-layer convolutional network were performed on an NVIDIA Titan X. All other experiments were performed on NVIDIA Tesla P100 devices in a DGX-1.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number for it or any other key software components.
Experiment Setup Yes Adam (Kingma & Ba, 2015) with learning rate 2.5e-4 and weight decay 5e-4 was used to minimize the cross-entropy loss for 300 epochs. The learning rate was decayed by a factor of 0.1 after 200 and 250 epochs.