PHOG: Probabilistic Model for Code

Authors: Pavol Bielik, Veselin Raychev, Martin Vechev

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We trained a PHOG model on a large Java Script code corpus and show that it is more precise than existing models, while similarly fast.
Researcher Affiliation Academia Pavol Bielik PAVOL.BIELIK@INF.ETHZ.CH Veselin Raychev VESELIN.RAYCHEV@INF.ETHZ.CH Martin Vechev MARTIN.VECHEV@INF.ETHZ.CH Department of Computer Science, ETH Z urich, Switzerland
Pseudocode No The paper describes a language (TCOND) and procedures, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link to the open-source code for the PHOG model or its implementation.
Open Datasets Yes In our experiments, we use a corpus of 150, 000 de-duplicated and non-obfuscated Java Script files from Git Hub (Raychev et al., 2016)1. 1http://www.srl.inf.ethz.ch/js150.php
Dataset Splits No The paper states “Two thirds of the data is used for training and the remaining one third is used only for evaluation,” but does not explicitly specify a separate validation dataset split.
Hardware Specification Yes Experiments were done on a 32-core 2.13 GHz Xeon E7-4830 server with 256GB RAM and running Ubuntu 14.04.
Software Dependencies No The paper mentions “Ubuntu 14.04” and “Acorn parser” but does not provide specific version numbers for software dependencies related to their PHOG implementation or libraries used.
Experiment Setup Yes We instantiate Ω(p) to return the number of instructions. and Overall, this search procedure explores 20, 000 functions out of which the best one is selected. and it mentions modified Kneser-Ney smoothing and Witten-Bell interpolation smoothing.