Amortized Proximal Optimization

Authors: Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger B. Grosse

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically test APO for online adaptation of learning rates and structured preconditioning matrices for regression, image reconstruction, image classification, and natural language translation tasks.
Researcher Affiliation Academia Juhan Bae 1,2, Paul Vicol 1,2, Jeff Z. Hao Chen3, Roger Grosse1,2 1University of Toronto, 2Vector Institute, 3Stanford University
Pseudocode Yes Algorithm 1 Amortized Proximal Optimization (APO) Meta-Learning Optimization Parameters ϕ
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We include code in the supplementary material.
Open Datasets Yes UCI Regression. Next, we validated APO-Precond on the Slice, Protein, and Parkinsons datasets from the UCI regression collection [18]. Citation [18] is 'UCI machine learning repository, 2017. URL http://archive.ics. uci.edu/ml.'
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] We provide the training details for all of our experiments in Appendix C.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We provide these details in Appendix C.1. Appendix C.1 states: 'Compute Infrastructure. All experiments were performed on Google Cloud Platform with NVIDIA Tesla V100 GPUs and TPUs.'
Software Dependencies No The paper mentions using PyTorch [67], JAX [14], and fairseq [65] but does not specify version numbers for these software components.
Experiment Setup Yes We trained Le Net [40], Alex Net [37], VGG-16 [71] (w/o batch norm [32]), Res Net-18, and Res Net32 [29] architectures for 200 epochs on batches of 128 images. For Res Net32, we trained for 400 epochs, and the decayed baseline used a step schedule with 10 decay at epochs 150 and 250, following [49].