reproducibilityindex.ai

Tensor Program Optimization with Probabilistic Programs

Authors: Junru Shao, Xiyou Zhou, Siyuan Feng, Bohan Hou, Ruihang Lai, Hongyi Jin, Wuwei Lin, Masahiro Masuda, Cody Hao Yu, Tianqi Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments This section aims to answer the following question: Is Meta Schedule expressive enough to capture the search space of the state-of-the-art optimization techniques? To answer this question, we evaluate our work on a diverse set of operators and subgraphs extracted from popular deep learning models, including variants of convolution, dense, and normalization. As baselines, Py Torch (v1.11.0) results are provided to compare performance with vendor libraries; TVM (commit: 8d4f4dd73f), which incorporates Auto TVM [10] and Ansor [43], is used as the state-of-the-art tensor program optimization system, and we pick the best among the two in each respective setups. Full operators and hardware conﬁgurations are documented in Appendix A.2. Figure 8 shows that, in all cases on CPU and GPU, Meta Schedule delivers performance comparable with or even better than TVM, from which we could infer that Meta Schedule could express optimization techniques comparable to TVM on diverse workloads. Additionally, in most of the cases, Meta Schedule outperforms Py Torch by a signiﬁcant margin except for SFM, which is highly optimized manually in Py Torch.
Researcher Affiliation	Collaboration	Octo ML jshao@octoml.ai Octo ML xiyou@octoml.ai Siyuan Feng Shanghai Jiao Tong University Bohan Hou Carnegie Mellon University Ruihang Lai Carnegie Mellon University Hongyi Jin Carnegie Mellon University Octo ML wlin@octoml.ai Masahiro Masuda Octo ML mmasuda@octoml.ai Cody Hao Yu Amazon Web Services hyuz@amazon.com Tianqi Chen Carnegie Mellon University, Octo ML tqchen@cmu.edu tqchen@octoml.ai
Pseudocode	Yes	Figure 3: The Meta Schedule probabilistic language. The language contains two key elements: (1) sampling of random variables; (2) program transformation based on random variables. An example execution instance: Step 1 : Draw tile sizes of and then organize the loops into a two-level tiling structure. Step 2 : Decide where to fuse the Re LU operator. (Shows pseudocode-like structure with `def Probabilistic-Program():`) Figure 4: Transformation modules. A transformation module consists of tensor program analysis, sampling, and stochastic transformations. The ﬁgure uses Multi-Level-Tiling as an example. (Shows pseudocode-like structure with `def Multi-Level-Tiling(loop_nest: List[Loop]):`)
Open Source Code	No	We will not include the URL for codebase for anonymity, and will release the link after the review process. Therefore, we will open-source our framework and hope it could enable broader collaboration between the machine learning deployment engineers and intelligent machine learning algorithms for tensor programs.
Open Datasets	Yes	Therefore, a series of experiments are conducted to compare Meta Schedule and TVM, including BERT-Base [14], Res Net-50 [20], and Mobile Net-v2 [36] on both CPU and GPU.
Dataset Splits	No	The provided paper text does not explicitly specify training, validation, or test dataset splits (e.g., percentages or sample counts). While the checklist mentions 'Operator conﬁgurations and hyperparameters for evolutionary search are shown in the Appendix,' the appendix itself is not included in the provided text.
Hardware Specification	Yes	in all cases on CPU and GPU... it is usually efﬁcient to use 512-bit vectorization over the inner loop when AVX-512 vector instructions are available on Intel CPUs
Software Dependencies	Yes	As baselines, Py Torch (v1.11.0) results are provided to compare performance with vendor libraries; TVM (commit: 8d4f4dd73f), which incorporates Auto TVM [10] and Ansor [43], is used as the state-of-the-art tensor program optimization system, and we pick the best among the two in each respective setups.
Experiment Setup	No	The paper states that 'Operator conﬁgurations and hyperparameters for evolutionary search are shown in the Appendix,' but the appendix itself is not provided in the given text. Therefore, specific experimental setup details like hyperparameter values are not present in the main body.