Proximal Backpropagation
Authors: Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude by demonstrating promising numerical results and show that Prox Prop can be effectively combined with common first order optimizers such as Adam. We study the behavior of Prox Prop in comparison to classical Back Prop for a supervised visual learning problem on the CIFAR-10 dataset. |
| Researcher Affiliation | Academia | 1 Technical University of Munich 2 University of Siegen |
| Pseudocode | Yes | Algorithm 1 Penalty formulation of Back Prop, Algorithm 2 Prox Prop |
| Open Source Code | Yes | 1https://github.com/tfrerix/proxprop |
| Open Datasets | Yes | We study the behavior of Prox Prop in comparison to classical Back Prop for a supervised visual learning problem on the CIFAR-10 dataset. |
| Dataset Splits | Yes | We used a subset of 45000 images for training while keeping 5000 images as a validation set. |
| Hardware Specification | Yes | all numerical experiments reported below were conducted on an NVIDIA Titan X GPU. |
| Software Dependencies | No | No specific version numbers for software dependencies were provided. The paper states: 'We chose Py Torch for our implementation'. |
| Experiment Setup | Yes | We used a subset of 45000 images for training while keeping 5000 images as a validation set. We initialized the parameters θl uniformly in [ 1/ nl 1, 1/ nl 1], the default initialization of Py Torch. Figure 2 shows the decay of the full batch training loss over epochs (left) and training time (middle) for a Nesterov momentum2 based optimizer using a momentum of µ = 0.95 and minibatches of size 500. We used τθ = 0.05 for the Prox Prop variants along with τ = 1. For Back Prop we chose τ = 0.05 as the optimal value we found in a grid search. |