Mechanic: A Learning Rate Tuner

Authors: Ashok Cutkosky, Aaron Defazio, Harsh Mehta

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate MECHANIC on a range of large scale deep learning tasks with varying batch sizes, schedules, and base optimization algorithms. These experiments demonstrate that depending on the problem, MECHANIC either comes very close to, matches or even improves upon manual tuning of learning rates.
Researcher Affiliation Collaboration Ashok Cutkosky Boston University Boston, MA ashok@cutkosky.com Aaron Defazio Meta, FAIR New York, NY adefazio@meta.com Harsh Mehta Google Research Mountain View, CA harshm@google.com
Pseudocode Yes Algorithm 1 MECHANIC and Algorithm 2 TUNER(theoretically tractable version)
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in the paper was found. The paper does not mention a code repository, supplementary material containing code, or an anonymous review link for code.
Open Datasets Yes We perform BERT pre-training on the Wikibooks dataset following the procedure from [35]... and We evaluate our models on the 5 largest datasets from the GLUE suite [39]. and ...finetune on Image Net, Cifar-10 and Cifar-100 datasets.
Dataset Splits No No explicit statement detailing the specific percentages, sample counts, or explicit methodology for creating validation dataset splits was found. While "validation scores" are mentioned, the exact split information for reproduction is not provided in the text.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments were provided in the paper.
Software Dependencies No No specific ancillary software details with version numbers (e.g., library or solver names with their specific versions) needed to replicate the experiments were found in the paper.
Experiment Setup Yes Table 7: Critical hyperparameters we used for BERT pre-training. and Table 11: Critical hyperparameters we used for all the experiments... and Table 13: Hyperparameters for tuning Res Net18 on CIFAR10 and Wide Res Net on CIFAR100