Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
Authors: Aditya Desai, Keren Zhou, Anshumali Shrivastava
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experimental Evaluation Setup: In this section, we evaluate the ROAST compression approach on two types of tasks. The details of the tasks, datasets, and models used are mentioned in Table 2. . For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10. |
| Researcher Affiliation | Collaboration | Aditya Desai 1 Keren Zhou 1 Anshumali Shrivastava 1 2 1 Department of Computer Science, Rice University, Houston, Texas, United States 2 Third AI Corp, Houston, Texas, United States. |
| Pseudocode | Yes | The pseudo code for ROAST-MM is shown in algorithm 1. |
| Open Source Code | Yes | ROAST-MM kernel implementation is open-source 1 1https://github.com/apd10/Rz Linear/tree/stable |
| Open Datasets | Yes | For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10. ... We use the two largest available text-classification datasets for NLP tasks on huggingface (Hugging Face, 2022). |
| Dataset Splits | No | Table 2 provides the number of samples for training and testing for each dataset (e.g., 'amazon-polarity 3.6M/0.4M' which implies train/test split). However, it does not explicitly provide specific split information or sample counts for a separate validation dataset. |
| Hardware Specification | Yes | The measurements were taken using TF32 on a NVIDIA A100 GPU (48GB). |
| Software Dependencies | No | The paper mentions software like Triton, cu BLAS, CUTLASS, and PyTorch, but does not specify their version numbers, which is necessary for reproducibility. |
| Experiment Setup | Yes | The other hyperparameters for NLP tasks are { batch 64 for amazon-polarity and 32 for yelp-polarity, learning rate 2e-5, Adam W optimizer, Linear scheduler}. Pruning is used as a baseline. We use iterative magnitude pruning interspersed with training. We use two schedules for pruning. full-9-1-schedule ( alt. full-1-9-schedule ) means we start with the fully trained model and then perform iterative magnitude pruning to require sparsity in 9 ( alt. 1) epochs and finally perform 1 ( alt. 9) epoch at final sparsity. |