reproducibilityindex.ai

Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing

Authors: Aditya Desai, Keren Zhou, Anshumali Shrivastava

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experimental Evaluation Setup: In this section, we evaluate the ROAST compression approach on two types of tasks. The details of the tasks, datasets, and models used are mentioned in Table 2. . For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10.
Researcher Affiliation	Collaboration	Aditya Desai 1 Keren Zhou 1 Anshumali Shrivastava 1 2 1 Department of Computer Science, Rice University, Houston, Texas, United States 2 Third AI Corp, Houston, Texas, United States.
Pseudocode	Yes	The pseudo code for ROAST-MM is shown in algorithm 1.
Open Source Code	Yes	ROAST-MM kernel implementation is open-source 1 1https://github.com/apd10/Rz Linear/tree/stable
Open Datasets	Yes	For image-classification tasks, we choose the cifar-10 dataset and the leader for the Dawn Benchmark (Coleman et al., 2017) a Res Net-9 model2 for cifar-10. ... We use the two largest available text-classification datasets for NLP tasks on huggingface (Hugging Face, 2022).
Dataset Splits	No	Table 2 provides the number of samples for training and testing for each dataset (e.g., 'amazon-polarity 3.6M/0.4M' which implies train/test split). However, it does not explicitly provide specific split information or sample counts for a separate validation dataset.
Hardware Specification	Yes	The measurements were taken using TF32 on a NVIDIA A100 GPU (48GB).
Software Dependencies	No	The paper mentions software like Triton, cu BLAS, CUTLASS, and PyTorch, but does not specify their version numbers, which is necessary for reproducibility.
Experiment Setup	Yes	The other hyperparameters for NLP tasks are { batch 64 for amazon-polarity and 32 for yelp-polarity, learning rate 2e-5, Adam W optimizer, Linear scheduler}. Pruning is used as a baseline. We use iterative magnitude pruning interspersed with training. We use two schedules for pruning. full-9-1-schedule ( alt. full-1-9-schedule ) means we start with the fully trained model and then perform iterative magnitude pruning to require sparsity in 9 ( alt. 1) epochs and finally perform 1 ( alt. 9) epoch at final sparsity.