reproducibilityindex.ai

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

Authors: Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier S. Turek, Tim Mattson, Abdullah Muzahid

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate Auto Perf s generality and efﬁcacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, Auto Perf exhibits 4% proﬁling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, Auto Perf has produced no false negatives.
Researcher Affiliation	Collaboration	Mejbah Alam Intel Labs mejbah.alam@intel.com Justin Gottschlich Intel Labs justin.gottschlich@intel.com Nesime Tatbul Intel Labs and MIT tatbul@csail.mit.edu Javier Turek Intel Labs javier.turek@intel.com Timothy Mattson Intel Labs timothy.g.mattson@intel.com Abdullah Muzahid Texas A&M University abdullah.muzahid@tamu.edu
Pseudocode	No	The paper describes the system components and their interactions but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an unambiguous statement or a direct link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We used 7 programs with known performance defects from the PARSEC [17] and the Phoenix [57] benchmark suites. Additionally, we evaluated 3 open-source programs: Boost [2], Memcached [4], and My SQL [5].
Dataset Splits	No	The paper describes training on an 'old version' and testing on a 'new version' and mentions running 'n number of times' but does not specify standard dataset splits (e.g., 80/10/10 percentages, sample counts for train/val/test) or explicit cross-validation methodology for data partitioning typically associated with ML reproducibility.
Hardware Specification	Yes	We performed all experiments on a 12-core dual socket Intel Xeon R Scalable 8268 processor [3] with 32GB RAM.
Software Dependencies	No	The paper mentions 'PAPI to read hardware performance counter values [49]' and 'Keras with Tensor Flow to implement autoencoders [19]' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Given two versions of a software program, Auto Perf ﬁrst compares their performance. If a degradation is observed, then the cause is likely to lie within the functions that differ in the two versions. Hence, Auto Perf automatically annotates the modiﬁed functions in both versions of the program and collects their HWPC proﬁles. The data collected for the older version is used for zero-positive model training, whereas the data collected for the newer version is used for inferencing based on the trained model. Auto Perf uses an autoencoder neural network to model normal performance behavior of a function [60]. To scale with a large number of functions, training data for functions with similar performance signatures are clustered together using k-means clustering and a single autoencoder model per cluster is trained [35]. Performance regressions are identiﬁed by measuring the reconstruction error that results from testing the autoencoders with proﬁle data from the new version of the program. If the error comes out to be sufﬁciently high, then the corresponding execution of the function is marked as a performance bug and its root cause is analyzed as the ﬁnal step of the diagnosis. ... The t parameter controls the level of thresholding. For example, with t = 2, the threshold provides (approximately) a 95% conﬁdence interval for the reconstruction error.