Training Over-parameterized Models with Non-decomposable Objectives
Authors: Harikrishna Narasimhan, Aditya K. Menon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training Res Net models with common robust and constrained optimization objectives. We trained Res Net-56 models on CIFAR-10 and CIFAR-100, and Res Net-18 models on Tiny Image Net, using SGD with momentum. |
| Researcher Affiliation | Industry | Harikrishna Narasimhan Google Research, Mountain View hnarasimhan@google.com Aditya Krishna Menon Google Research, New York adityakmenon@google.com |
| Pseudocode | Yes | Algorithm 1 Reductions-based Algorithm for Maximizing Worst-case Recall (1) |
| Open Source Code | No | Code will be made available at: https://github.com/google-research/google-research/tree/master/non_decomp |
| Open Datasets | Yes | We trained Res Net-56 models on CIFAR-10 and CIFAR-100, and Res Net-18 models on Tiny Image Net... [47] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. [52] Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 2015. |
| Dataset Splits | Yes | In each case, we use a balanced validation sample of 5000 held-out images, and a balanced test set of the same size. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., library or solver names with versions). |
| Experiment Setup | Yes | We trained Res Net-56 models on CIFAR-10 and CIFAR-100, and Res Net-18 models on Tiny Image Net, using SGD with momentum. We provide details about our hyper-parameters choices in Appendix E. For the CIFAR datasets, we perform 32 SGD steps on the cost-sensitive loss for every update on G, and for Tiny Image Net, we perform 100 SGD steps for every update on G. |