Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
Authors: Aditya Desai, Keren Zhou, Anshumali Shrivastava
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experimental Evaluation Setup: In this section, we evaluate the ROAST compression approach on two types of tasks. The details of the tasks, datasets, and models used are mentioned in Table 2. . For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10. |
| Researcher Affiliation | Collaboration | Aditya Desai 1 Keren Zhou 1 Anshumali Shrivastava 1 2 1 Department of Computer Science, Rice University, Houston, Texas, United States 2 Third AI Corp, Houston, Texas, United States. |
| Pseudocode | Yes | The pseudo code for ROAST-MM is shown in algorithm 1. |
| Open Source Code | Yes | ROAST-MM kernel implementation is open-source 1 1https://github.com/apd10/Rz Linear/tree/stable |
| Open Datasets | Yes | For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10. ... We use the two largest available text-classification datasets for NLP tasks on huggingface (Hugging Face, 2022). |
| Dataset Splits | No | Table 2 provides the number of samples for training and testing for each dataset (e.g., 'amazon-polarity 3.6M/0.4M' which implies train/test split). However, it does not explicitly provide specific split information or sample counts for a separate validation dataset. |
| Hardware Specification | Yes | The measurements were taken using TF32 on a NVIDIA A100 GPU (48GB). |
| Software Dependencies | No | The paper mentions software like Triton, cu BLAS, CUTLASS, and PyTorch, but does not specify their version numbers, which is necessary for reproducibility. |
| Experiment Setup | Yes | The other hyperparameters for NLP tasks are { batch 64 for amazon-polarity and 32 for yelp-polarity, learning rate 2e-5, Adam W optimizer, Linear scheduler}. Pruning is used as a baseline. We use iterative magnitude pruning interspersed with training. We use two schedules for pruning. full-9-1-schedule ( alt. full-1-9-schedule ) means we start with the fully trained model and then perform iterative magnitude pruning to require sparsity in 9 ( alt. 1) epochs and finally perform 1 ( alt. 9) epoch at final sparsity. |