reproducibilityindex.ai

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

Authors: Minjia Zhang, Menghao Li, Chi Wang, Mingqin Li

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate and compare Dyna Tune with the state-of-the-art DL compiler. The experiment results show that Dyna Tune is 1.2 2.4 times faster to achieve the same optimization quality for a range of models across different hardware architectures.
Researcher Affiliation	Industry	Minjia Zhang , Menghao Li*, Chi Wang & Mingqin Li Microsoft Corporation {minjiaz,t-meli,wang.chi,mingqli}@microsoft.com
Pseudocode	Yes	Algorithm 1 Dyna Tune: Dynamic Multi-Tensor-Operator Optimization
Open Source Code	No	The paper does not explicitly provide a link to the source code for DynaTune or state that it is open-sourced or available.
Open Datasets	Yes	We include four tasks, covering both CPU and GPU hardware: Res Net-18 (He et al., 2016) and Squeeze Net (Iandola et al., 2016) on CPU... VGG (Simonyan & Zisserman, 2015) Transformer Encoder (Iandola et al., 2016) on GPUs...
Dataset Splits	No	The paper refers to 'train', 'validation', and 'test' in the context of a general compilation pipeline (Fig. 1) but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for the models evaluated in their experiments.
Hardware Specification	Yes	Res Net-18 (He et al., 2016) and Squeeze Net (Iandola et al., 2016) on CPU (Intel Xeon CPU E5-2690 v3 @ 2.60GHz 2600 MHz), VGG (Simonyan & Zisserman, 2015) Transformer Encoder (Iandola et al., 2016) on GPUs (Nvidia Tesla P100)
Software Dependencies	No	The paper mentions 'Auto TVM', 'Python', and 'emcee' for implementation, but it does not specify concrete version numbers for these software components (e.g., Python 3.x, emcee vX.Y).
Experiment Setup	Yes	We use the default hyperparameters provided by Auto TVM for the underlying code optimization. To obtain the parameter posterior, we run the ensemble MCMC with 10 walkers and 500 sampling steps. For UCB, we choose a default value of C = 2 suggested by the theory in Auer et al. (2002), which we ﬁnd to be robust to different range of latencies. When the initial latency is <1ms, we empirically ﬁnd that C=0.2 leads to increased performance, which we report.