Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Authors: Sayantan Choudhury, Nazarii Tupitsa, Nicolas Loizou, Samuel Horváth, Martin Takac, Eduard Gorbunov
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also compare KATE to other state-of-the-art adaptive algorithms Adam and Ada Grad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. 4 Numerical Experiments In this section, we implement KATE in several machine learning tasks to evaluate its performance. |
| Researcher Affiliation | Academia | Sayantan Choudhury MBZUAI & Johns Hopkins University Nazarii Tupitsa MBZUAI & Innopolis University Nicolas Loizou Johns Hopkins University Samuel Horváth MBZUAI Martin Takáˇc MBZUAI Eduard Gorbunov MBZUAI |
| Pseudocode | Yes | Algorithm 1 KATE |
| Open Source Code | Yes | To ensure transparency and facilitate reproducibility, we provide an access to the source code for all of our experiments at https://github.com/nazya/KATE. |
| Open Datasets | Yes | We test KATE on three datasets: heart, australian, and splice from the LIBSVM library (Chang and Lin, 2011). training Res Net18 (He et al., 2016) on the CIFAR10 dataset (Krizhevsky and Hinton, 2009) and BERT (Devlin et al., 2018) fine-tuning on the emotions dataset (Saravia et al., 2018) from the Hugging Face Hub. |
| Dataset Splits | No | The paper uses standard datasets like CIFAR10 and LIBSVM datasets, which typically have predefined splits. However, it does not explicitly state the training, validation, and test splits used for its experiments (e.g., specific percentages, sample counts, or explicit mention of 'standard' split being applied with a citation). |
| Hardware Specification | Yes | We use internal cluster with the following hardware: AMD EPYC 7552 48-Core Processor, 512Gi B RAM, NVIDIA A100 40GB GPU, 200gb user storage space. |
| Software Dependencies | No | The paper mentions 'PyTorch' as the framework used for Adam default values, but it does not specify exact version numbers for PyTorch or any other software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We choose standard parameters for Adam (β1 = 0.9 and β2 = 0.999) that are default values in Py Torch and select the learning rate of 10 5 for all considered methods. We run KATE with different values of η {0, 10 1, 10 2}. For the image classification task, we normalize the images (similar to Horváth and Richtárik (2020)) and use a mini-batch size of 500. For the BERT fine-tuning, we use a mini-batch size 160 for all methods. |