Mechanic: A Learning Rate Tuner
Authors: Ashok Cutkosky, Aaron Defazio, Harsh Mehta
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate MECHANIC on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms. These experiments demonstrate that depending on the problem, MECHANIC either comes very close to, matches or even improves upon manual tuning of learning rates. |
| Researcher Affiliation | Collaboration | Ashok Cutkosky Boston University Boston, MA ashok@cutkosky.com Aaron Defazio Meta, FAIR New York, NY adefazio@meta.com Harsh Mehta Google Research Mountain View, CA harshm@google.com |
| Pseudocode | Yes | Algorithm 1 MECHANIC and Algorithm 2 TUNER(theoretically tractable version) |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the methodology described in the paper was found. The paper does not mention a code repository, supplementary material containing code, or an anonymous review link for code. |
| Open Datasets | Yes | We perform BERT pre-training on the Wikibooks dataset following the procedure from [35]... and We evaluate our models on the 5 largest datasets from the GLUE suite [39]. and ...finetune on Image Net, Cifar-10 and Cifar-100 datasets. |
| Dataset Splits | No | No explicit statement detailing the specific percentages, sample counts, or explicit methodology for creating validation dataset splits was found. While "validation scores" are mentioned, the exact split information for reproduction is not provided in the text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments were provided in the paper. |
| Software Dependencies | No | No specific ancillary software details with version numbers (e.g., library or solver names with their specific versions) needed to replicate the experiments were found in the paper. |
| Experiment Setup | Yes | Table 7: Critical hyperparameters we used for BERT pre-training. and Table 11: Critical hyperparameters we used for all the experiments... and Table 13: Hyperparameters for tuning Res Net18 on CIFAR10 and Wide Res Net on CIFAR100 |