Amortized Proximal Optimization
Authors: Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger B. Grosse
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically test APO for online adaptation of learning rates and structured preconditioning matrices for regression, image reconstruction, image classification, and natural language translation tasks. |
| Researcher Affiliation | Academia | Juhan Bae 1,2, Paul Vicol 1,2, Jeff Z. Hao Chen3, Roger Grosse1,2 1University of Toronto, 2Vector Institute, 3Stanford University |
| Pseudocode | Yes | Algorithm 1 Amortized Proximal Optimization (APO) Meta-Learning Optimization Parameters ϕ |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We include code in the supplementary material. |
| Open Datasets | Yes | UCI Regression. Next, we validated APO-Precond on the Slice, Protein, and Parkinsons datasets from the UCI regression collection [18]. Citation [18] is 'UCI machine learning repository, 2017. URL http://archive.ics. uci.edu/ml.' |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] We provide the training details for all of our experiments in Appendix C. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We provide these details in Appendix C.1. Appendix C.1 states: 'Compute Infrastructure. All experiments were performed on Google Cloud Platform with NVIDIA Tesla V100 GPUs and TPUs.' |
| Software Dependencies | No | The paper mentions using PyTorch [67], JAX [14], and fairseq [65] but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We trained Le Net [40], Alex Net [37], VGG-16 [71] (w/o batch norm [32]), Res Net-18, and Res Net32 [29] architectures for 200 epochs on batches of 128 images. For Res Net32, we trained for 400 epochs, and the decayed baseline used a step schedule with 10 decay at epochs 150 and 250, following [49]. |