Lyra: A Benchmark for Turducken-Style Code Generation

Authors: Qingyuan Liang, Zeyu Sun, Qihao Zhu, Wenjie Zhang, Lian Yu, Yingfei Xiong, Lu Zhang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiment, we adopted Transformer, BERT-style, and GPT-style models as baselines. In the best setting, the generation performance of GPT-style models is better than others, where the AST exact matching accuracy is 24% and 25.5% when using Chinese and English comments, respectively.
Researcher Affiliation Academia 1Key Laboratory of High Confidence Software Technologies, Ministry of Education (Peking University). School of Computer Science, Peking University. Beijing, PR China 2School of Software & Microelectronics, Peking University. Beijing, PR China
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The Lyra dataset and code is avaliable at https://github.com/LIANGQINGYUAN/Lyra.
Open Datasets Yes The Lyra dataset and code is avaliable at https://github.com/LIANGQINGYUAN/Lyra.
Dataset Splits Yes We randomly selected the 10% of 2,000 examples in our dataset for testing and validation respectively, and the remaining 80% for training.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions Python and tools like Pylint, but does not provide specific version numbers for these or other software dependencies required for reproducibility.
Experiment Setup No The paper mentions the models used and dataset splits but does not provide specific hyperparameter values or detailed training configurations for the experimental setup.