Tensor Program Optimization with Probabilistic Programs
Authors: Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments This section aims to answer the following question: Is Meta Schedule expressive enough to capture the search space of the state-of-the-art optimization techniques? To answer this question, we evaluate our work on a diverse set of operators and subgraphs extracted from popular deep learning models, including variants of convolution, dense, and normalization. As baselines, Py Torch (v1.11.0) results are provided to compare performance with vendor libraries; TVM (commit: 8d4f4dd73f), which incorporates Auto TVM [10] and Ansor [43], is used as the state-of-the-art tensor program optimization system, and we pick the best among the two in each respective setups. Full operators and hardware configurations are documented in Appendix A.2. Figure 8 shows that, in all cases on CPU and GPU, Meta Schedule delivers performance comparable with or even better than TVM, from which we could infer that Meta Schedule could express optimization techniques comparable to TVM on diverse workloads. Additionally, in most of the cases, Meta Schedule outperforms Py Torch by a significant margin except for SFM, which is highly optimized manually in Py Torch. |
| Researcher Affiliation | Collaboration | Octo ML jshao@octoml.ai Octo ML xiyou@octoml.ai Siyuan Feng Shanghai Jiao Tong University Bohan Hou Carnegie Mellon University Ruihang Lai Carnegie Mellon University Hongyi Jin Carnegie Mellon University Octo ML wlin@octoml.ai Masahiro Masuda Octo ML mmasuda@octoml.ai Cody Hao Yu Amazon Web Services hyuz@amazon.com Tianqi Chen Carnegie Mellon University, Octo ML tqchen@cmu.edu tqchen@octoml.ai |
| Pseudocode | Yes | Figure 3: The Meta Schedule probabilistic language. The language contains two key elements: (1) sampling of random variables; (2) program transformation based on random variables. An example execution instance: Step 1 : Draw tile sizes of and then organize the loops into a two-level tiling structure. Step 2 : Decide where to fuse the Re LU operator. (Shows pseudocode-like structure with `def Probabilistic-Program():`) Figure 4: Transformation modules. A transformation module consists of tensor program analysis, sampling, and stochastic transformations. The figure uses Multi-Level-Tiling as an example. (Shows pseudocode-like structure with `def Multi-Level-Tiling(loop_nest: List[Loop]):`) |
| Open Source Code | No | We will not include the URL for codebase for anonymity, and will release the link after the review process. Therefore, we will open-source our framework and hope it could enable broader collaboration between the machine learning deployment engineers and intelligent machine learning algorithms for tensor programs. |
| Open Datasets | Yes | Therefore, a series of experiments are conducted to compare Meta Schedule and TVM, including BERT-Base [14], Res Net-50 [20], and Mobile Net-v2 [36] on both CPU and GPU. |
| Dataset Splits | No | The provided paper text does not explicitly specify training, validation, or test dataset splits (e.g., percentages or sample counts). While the checklist mentions 'Operator configurations and hyperparameters for evolutionary search are shown in the Appendix,' the appendix itself is not included in the provided text. |
| Hardware Specification | Yes | in all cases on CPU and GPU... it is usually efficient to use 512-bit vectorization over the inner loop when AVX-512 vector instructions are available on Intel CPUs |
| Software Dependencies | Yes | As baselines, Py Torch (v1.11.0) results are provided to compare performance with vendor libraries; TVM (commit: 8d4f4dd73f), which incorporates Auto TVM [10] and Ansor [43], is used as the state-of-the-art tensor program optimization system, and we pick the best among the two in each respective setups. |
| Experiment Setup | No | The paper states that 'Operator configurations and hyperparameters for evolutionary search are shown in the Appendix,' but the appendix itself is not provided in the given text. Therefore, specific experimental setup details like hyperparameter values are not present in the main body. |