Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Grape: Grammar-Preserving Rule Embedding
Authors: Qihao Zhu, Zeyu Sun, Wenjie Zhang, Yingfei Xiong, Lu Zhang
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on six widely-used benchmarks containing four context-free languages. The results show that our approach improves the accuracy of the base model by 0.8 to 6.4 percentage points. |
| Researcher Affiliation | Academia | Qihao Zhu , Zeyu Sun , Wenjie Zhang , Yingfei Xiong , Lu Zhang Key Laboratory of High Confidence Software Technologies, Ministry of Education(Peking University); School of Computer Science, Peking University, 100871, P. R. China EMAIL |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/pkuzqh/Grape |
| Open Datasets | Yes | We evaluated our approach on six benchmarks, including the Hearth Stone benchmark [Ling et al., 2016], two semantic parsing benchmarks [Dong and Lapata, 2016], the Django benchmark [Yin and Neubig, 2017], the Concode benchmark [Iyer et al., 2018] and the Str Reg benchmark [Ye et al., 2020]. For this task, we adopted the widely used Java benchmark [Alon et al., 2019; Alon et al., 2018], Java-small, which contains 11 relatively large Java projects. |
| Dataset Splits | Yes | The statistics of these datasets are shown in Table 1. # Train, # Dev, # Test. This dataset contains 691,607 examples in the training set, 23,844 examples in the validation set and 57,088 examples in the test set. |
| Hardware Specification | Yes | It takes 34.78s for an epoch on a single Nvidia Titan RTX with Grape on average, whereas 30.74s without Grape. |
| Software Dependencies | No | The paper mentions using Adam optimizer and Tree Gen as the base model, and refers to official parsers for Java and Python (https://docs.python.org/3/library/ast.html and https://github.com/c2nes/javalang), but does not specify version numbers for general software dependencies or these parsers. |
| Experiment Setup | Yes | For the hyperparameters of our model, we set the number of iterations N = 9. The hidden sizes were all set to 256. We applied dropout after each iteration of the GNN layer, where the drop rate is 0.15. The model was optimized by Adam with learning rate lr = 0.0001. |