Learning to Optimize Tensor Programs
Authors: Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our framework delivers performance that is competitive with state-of-the-art hand-tuned libraries for low-power CPUs, mobile GPUs, and server-class GPUs. We provide a detailed empirical analysis of component design choices in this framework. Experimental results on real-world DL workloads show that our framework yields end-to-end performance improvements ranging from 1.2 to 3.8 over existing frameworks. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1: Learning to Optimize Tensor Programs |
| Open Source Code | Yes | Our framework can be found at https://tvm.ai. |
| Open Datasets | Yes | Component evaluations were based on convolution workloads in Res Net-18 [14] for Image Net classification (Table 1). |
| Dataset Splits | No | The paper mentions using ResNet-18 and MobileNet, which are typically evaluated on standard datasets with predefined splits, but it does not explicitly provide the specific training, validation, or test dataset splits (e.g., percentages or counts) used for their experiments. |
| Hardware Specification | Yes | We compared our approach to existing DL frameworks backed by highly engineered hardware-specific libraries on diverse hardware back-ends: a server class GPU, an embedded CPU, and a mobile GPU. The baselines were: cu DNN v7 for the NVIDIA GPU, TFLite(commit: 7558b085) for the Cortex-A53, and the ARM Compute Library (v18.03) for the ARM Mali GPU. |
| Software Dependencies | Yes | The baselines were: cu DNN v7 for the NVIDIA GPU, TFLite(commit: 7558b085) for the Cortex-A53, and the ARM Compute Library (v18.03) for the ARM Mali GPU. Our baselines were: MXNet (v1.1), Tensorflow (v1.7) for the GPU, TFLite(commit: 7558b085) for the Cortex A53, and ARM Compute Library (v18.03) for the ARM Mali GPU. |
| Experiment Setup | Yes | Algorithm 1: Learning to Optimize Tensor Programs Input : Transformation space Se Output : Selected schedule configuration s D while n_trials < max_n_trials do // Pick the next promising batch Q run parallel simulated annealing to collect candidates in Se using energy function ˆf S run greedy submodular optimization to pick (1 ϵ)b-subset from Q by maximizing Equation 3 S S { Randomly sample ϵb candidates. } // Run measurement on hardware environment for s in S do c f(g(e, s)); D D {(e, s, c)} end // Update cost model update ˆf using D n_trials n_trials + b end s history best schedule configuration. We randomly picked samples from D collected from C1,C2,C3,C4,C5,C6 and used them to form the source domain (30000 samples in the TITAN X experiment and 20000 samples in the ARM GPU and ARM A53 experiments). |