Proximal Backpropagation

Authors: Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude by demonstrating promising numerical results and show that Prox Prop can be effectively combined with common first order optimizers such as Adam. We study the behavior of Prox Prop in comparison to classical Back Prop for a supervised visual learning problem on the CIFAR-10 dataset.
Researcher Affiliation Academia 1 Technical University of Munich 2 University of Siegen
Pseudocode Yes Algorithm 1 Penalty formulation of Back Prop, Algorithm 2 Prox Prop
Open Source Code Yes 1https://github.com/tfrerix/proxprop
Open Datasets Yes We study the behavior of Prox Prop in comparison to classical Back Prop for a supervised visual learning problem on the CIFAR-10 dataset.
Dataset Splits Yes We used a subset of 45000 images for training while keeping 5000 images as a validation set.
Hardware Specification Yes all numerical experiments reported below were conducted on an NVIDIA Titan X GPU.
Software Dependencies No No specific version numbers for software dependencies were provided. The paper states: 'We chose Py Torch for our implementation'.
Experiment Setup Yes We used a subset of 45000 images for training while keeping 5000 images as a validation set. We initialized the parameters θl uniformly in [ 1/ nl 1, 1/ nl 1], the default initialization of Py Torch. Figure 2 shows the decay of the full batch training loss over epochs (left) and training time (middle) for a Nesterov momentum2 based optimizer using a momentum of µ = 0.95 and minibatches of size 500. We used τθ = 0.05 for the Prox Prop variants along with τ = 1. For Back Prop we chose τ = 0.05 as the optimal value we found in a grid search.