reproducibilityindex.ai

CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation

Authors: Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, Mayur Naik

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Code Trek on four diverse and challenging Python tasks: variable misuse, exception prediction, unused deﬁnition, and variable shadowing. Code Trek achieves an accuracy of 91%, 63%, 98%, and 94% on these tasks respectively, and outperforms state-of-the-art neural models by 2-19% points.
Researcher Affiliation	Collaboration	Pardis Pashakhanloo University of Pennsylvania Aaditya Naik University of Pennsylvania Yuepeng Wang Simon Fraser University Hanjun Dai Google Research Petros Maniatis Google Research Mayur Naik University of Pennsylvania
Pseudocode	Yes	Algorithm 1 (Code2Rel) Given a program P, a set of base relation names RB, and a set of derived relation names RQ, construct and return database D. Algorithm 2 (Rel2Graph) Given a database D, construct a program graph G. Algorithm 3 (Graph2Walks) Given a program graph G, a walk speciﬁcation S = C, B, min, max , and the number of walks w, sample a set of walks W. Algorithm 4 (Code2Walks) Given a program P and a task speciﬁcation T = RB, RQ, S, n , generate a set of walks W.
Open Source Code	Yes	CODETREK is publicly available at https://github.com/ppashakhanloo/Code Trek.
Open Datasets	Yes	We use the ETH Py150 Open corpus consisting of 125K Python modules1. 1https://github.com/google-research-datasets/eth_py150_open
Dataset Splits	Yes	Table 9: The number of samples used for training, validation, and testing and the lines of code that they contain.
Hardware Specification	No	The paper mentions '8 GPUs for distributed synchronized SGD training' but does not specify the model or type of these GPUs or any other specific hardware components.
Software Dependencies	No	The paper lists frameworks and models used (e.g., Cu BERT, GREAT, Code2Seq, GGNN) and mentions 'tensor2tensor package', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	In this section, we describe details on parameters and hyperparameters we used. Code Trek We train CODETREK models with a learning rate of 10 4, 4 transformer layers, an embedding size of 256, 8 attention heads, and 512 hidden units. We sample 100 walks with lengths of up to 24 in each graph for every task, except for the VARMISUSE-FUN task for which we sample 500 such walks per graph.