BALTO: fast tensor program optimization with diversity-based active learning

Authors: Jun Bi, Xiaqing Li, Qi Guo, Rui Zhang, Yuanbo Wen, Xing Hu, Zidong Du, Xinkai Song, Yifan Hao, Yunji Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare BALTO against Ten Set on 6 typical hardware platforms over 2 learning models. Experimental results show that, on average, BALTO only requires 5% of the total measurements of Ten Set to achieve the same or higher model accuracy.
Researcher Affiliation Collaboration Jun Bi1,2,3, Xiaqing Li2, Qi Guo2, Rui Zhang2, Yuanbo Wen2, Xing Hu2, Zidong Du2, Xinkai Song2, Yifan Hao2, Yunji Chen2 1University of Science and Technology of China 2SKL of Processors, ICT, CAS 3Cambricon Technologies, Beijing, China
Pseudocode Yes Algorithm 1 Core-set Greedy Selection; Algorithm 2 Biased-diversity-based Selection
Open Source Code No The paper does not provide a link to its own open-source code or explicitly state that its code will be made available. It mentions integrating BALTO into existing frameworks like Ten Set and Meta Scheduler.
Open Datasets Yes We evaluate BALTO s effectiveness on the dataset provided by Ten Set. The dataset consists of program performance measurement records from 6 different hardware platforms. Each platform includes a total number of 8,596,208 tensor program measurement records that are sampled from 2307 different types of tasks. (Zheng et al., 2021)
Dataset Splits No The paper states: 'We use 10% of the records as the test dataset and the remaining 90% records as the train dataset for the baselines.' and 'We further select at most 5% of the training dataset for training BALTO and other active learning approaches.' However, it does not explicitly define a validation split or its size.
Hardware Specification Yes We compare BALTO against Ten Set on 6 typical hardware platforms (i.e., two GPU platforms and four CPU platforms) over two learning models. GPU1 and GPU-2 represents NVIDIA T4 and NVIDIA K80, respectively. CPU-1, CPU-2, CPU-3, and CPU-4 represents Intel Platinum 8272CL, Intel E5-2673 v4, AMD EPYC 7452, and ARM Graviton2, respectively.
Software Dependencies No The paper mentions machine learning models like XGBoost and MLP, and discusses compilers and frameworks like Ten Set, Ansor, Meta Scheduler, and TVM, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes All the models adopt the same hyperparamenters with Ten Set. For the three CNN models, we set the batch size to 1 and the input shape to 224 224. For BERT models, we set the sequence length to be equal to 128. We assign at most 1,000 trials of hardware measurements for each network and report the average execution time based on 5 independent experiments of program optimization.