No One Representation to Rule Them All: Overlapping Features of Training Methods
Authors: Raphael Gontijo-Lopes, Yann Dauphin, Ekin Dogus Cubuk
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets. We find that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. |
| Researcher Affiliation | Industry | Raphael Gontijo-Lopes, Yann Dauphin & Ekin D. Cubuk Google Research, Brain Team {iraphael,ynd,cubuk}@google.com |
| Pseudocode | No | The paper describes methods and processes in narrative text and uses figures to present results and conceptual diagrams, but it does not contain any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We conduct a large-scale empirical study of 82 models, which we train or collect, across hyper-parameters, architectures, objective functions, and datasets, including the latest high performing models CLIP, ALIGN, Sim CLR, Bi T, Vi T-G/14, and MPL. In addition to using different techniques, these new models were trained on data collected very differently, allowing us to probe the effect of both training objective, as well as pre-training data. We fix Res Net-50, trained with Rand Augment, as our base model. Res Net is a good candidate for a base model since it is one of the most typical Image Net classification models, and the de-facto standard baseline for this task. ...trained on WIT (Radford et al., 2021), the ALIGN dataset, JFT (Sun et al., 2017), etc. ...linearly evaluate them on Pascal VOC (Everingham et al., 2010) |
| Dataset Splits | No | The paper mentions calibrating models using temperature scaling for maximizing ensemble performance and refers to models being in a 'narrow accuracy range (74-78% accuracy on Image Net)'. It discusses 'test-set examples' but does not specify the explicit train/validation/test dataset splits, percentages, or sample counts needed for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions general tools like L-BFGS and Rand Augment, and models/frameworks like ResNet, SimCLR, CLIP, etc., but it does not specify any software versions for programming languages, libraries, or specific deep learning frameworks (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We collect representations and predictions for 82 models, across the many categories above. We fix Res Net-50, trained with Rand Augment, as our base model. ... We found it necessary to calibrate all models using temperature scaling (Roelofs et al., 2020; Guo et al., 2017) to maximize ensemble performance. ... We collect models in the categories: 1) Reinit; 2) Hyperparameters (51): varying dropout, dropblock, learning rate, and weight decay, sometimes jointly; 3) Architectures (17): including Efficient Net, Vi T, Dense Net, VGG; 4) Framework (2): including Sim CLR, and models trained with distillation; and 5) Dataset (12): including CLIP, ALIGN, Bi T, and more, trained on WIT (Radford et al., 2021), the ALIGN dataset, JFT (Sun et al., 2017), etc. |