DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation
Authors: Minjia Zhang, Menghao Li, Chi Wang, Mingqin Li
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate and compare Dyna Tune with the state-of-the-art DL compiler. The experiment results show that Dyna Tune is 1.2 2.4 times faster to achieve the same optimization quality for a range of models across different hardware architectures. |
| Researcher Affiliation | Industry | Minjia Zhang , Menghao Li*, Chi Wang & Mingqin Li Microsoft Corporation {minjiaz,t-meli,wang.chi,mingqli}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Dyna Tune: Dynamic Multi-Tensor-Operator Optimization |
| Open Source Code | No | The paper does not explicitly provide a link to the source code for DynaTune or state that it is open-sourced or available. |
| Open Datasets | Yes | We include four tasks, covering both CPU and GPU hardware: Res Net-18 (He et al., 2016) and Squeeze Net (Iandola et al., 2016) on CPU... VGG (Simonyan & Zisserman, 2015) Transformer Encoder (Iandola et al., 2016) on GPUs... |
| Dataset Splits | No | The paper refers to 'train', 'validation', and 'test' in the context of a general compilation pipeline (Fig. 1) but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for the models evaluated in their experiments. |
| Hardware Specification | Yes | Res Net-18 (He et al., 2016) and Squeeze Net (Iandola et al., 2016) on CPU (Intel Xeon CPU E5-2690 v3 @ 2.60GHz 2600 MHz), VGG (Simonyan & Zisserman, 2015) Transformer Encoder (Iandola et al., 2016) on GPUs (Nvidia Tesla P100) |
| Software Dependencies | No | The paper mentions 'Auto TVM', 'Python', and 'emcee' for implementation, but it does not specify concrete version numbers for these software components (e.g., Python 3.x, emcee vX.Y). |
| Experiment Setup | Yes | We use the default hyperparameters provided by Auto TVM for the underlying code optimization. To obtain the parameter posterior, we run the ensemble MCMC with 10 walkers and 500 sampling steps. For UCB, we choose a default value of C = 2 suggested by the theory in Auer et al. (2002), which we find to be robust to different range of latencies. When the initial latency is <1ms, we empirically find that C=0.2 leads to increased performance, which we report. |