Neural Code Comprehension: A Learnable Representation of Code Semantics
Authors: Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Neural Code Comprehension is evaluated on multiple levels, using clustering and analogies for inst2vec, as well as three different code comprehension tasks for XFGs: algorithm classification; heterogeneous compute device (e.g., CPU, GPU) mapping; and optimal thread coarsening factor prediction, which model the runtime of an application without running it. (Page 2) and In this section, we evaluate inst2vec on three different tasks, comparing with manually-extracted features and state-of-the-art specialized deep learning approaches. (Page 5) |
| Researcher Affiliation | Academia | Tal Ben-Nun ETH Zurich Zurich 8092, Switzerland talbn@inf.ethz.ch Alice Shoshana Jakobovits ETH Zurich Zurich 8092, Switzerland alicej@student.ethz.ch Torsten Hoefler ETH Zurich Zurich 8092, Switzerland htor@inf.ethz.ch |
| Pseudocode | No | The paper describes processes using text and diagrams, but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code, datasets, trained embeddings, and results available at https://www.github.com/spcl/ncc (Footnote on Page 2) |
| Open Datasets | Yes | The algorithm classification task uses the POJ-104 [49] dataset2, collected from a Pedagogical Open Judge system. (Page 5) and For the compute device mapping and optimal thread coarsening factor prediction tasks, we use an Open CL code dataset3 provided by Cummins et al. [18]. (Page 5) and 1Code, datasets, trained embeddings, and results available at https://www.github.com/spcl/ncc (Footnote on Page 2) |
| Dataset Splits | Yes | Our data preparation follows the experiment conducted by Mou et al. [49], splitting the dataset 3:1:1 for training, validation, and testing. (Page 5, Section 6.1) and We concatenate the data and work-group sizes to the network inputs, and train with stratified 10-fold cross-validation. (Page 6, Section 6.2) |
| Hardware Specification | Yes | Next, we use Neural Code Comprehension to predict whether a given Open CL program will run faster on a CPU (Intel Core i7-3820) or a GPU (AMD Tahiti 7970 and NVIDIA GTX 970) given its code, input data size, and work-group size (Page 6, Section 6.2) and Computing Platform Magni et al. [46] Deep Tune [18] Deep Tune-TL [18] inst2vec inst2vec-imm AMD Radeon HD 5900 1.21 1.10 1.17 1.37 1.28 AMD Tahiti 7970 1.01 1.05 1.23 1.10 1.18 NVIDIA GTX 480 0.86 1.10 1.14 1.07 1.11 NVIDIA Tesla K20c 0.94 0.99 0.93 1.06 1.00 (Table 5, Page 7) |
| Software Dependencies | No | The paper mentions software like TensorFlow, Adam optimizer, LLVM, Clang, and Flang, but does not provide specific version numbers for these software components as used in the experiments. |
| Experiment Setup | Yes | Training Our recurrent network (see schematic description in the Appendix B) consists of an inst2vec input with an XFG context size of 2, followed by two stacked LSTM [33] layers with 200 units in each layer, batch normalization [35], a dense 32-neuron layer with Re LU activations, and output units matching the number of classes. The loss function is a categorical cross-entropy trained using Adam [37] with the default hyperparameters. (Page 5) and We train inst2vec with an embedding dimension of 200 for 5 epochs using Tensorflow [1]. (Page 5, Section 5.1) |