Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Lyra: A Benchmark for Turducken-Style Code Generation
Authors: Qingyuan Liang, Zeyu Sun, Qihao Zhu, Wenjie Zhang, Lian Yu, Yingfei Xiong, Lu Zhang
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiment, we adopted Transformer, BERT-style, and GPT-style models as baselines. In the best setting, the generation performance of GPT-style models is better than others, where the AST exact matching accuracy is 24% and 25.5% when using Chinese and English comments, respectively. |
| Researcher Affiliation | Academia | 1Key Laboratory of High Confidence Software Technologies, Ministry of Education (Peking University). School of Computer Science, Peking University. Beijing, PR China 2School of Software & Microelectronics, Peking University. Beijing, PR China |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | The Lyra dataset and code is avaliable at https://github.com/LIANGQINGYUAN/Lyra. |
| Open Datasets | Yes | The Lyra dataset and code is avaliable at https://github.com/LIANGQINGYUAN/Lyra. |
| Dataset Splits | Yes | We randomly selected the 10% of 2,000 examples in our dataset for testing and validation respectively, and the remaining 80% for training. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions Python and tools like Pylint, but does not provide specific version numbers for these or other software dependencies required for reproducibility. |
| Experiment Setup | No | The paper mentions the models used and dataset splits but does not provide specific hyperparameter values or detailed training configurations for the experimental setup. |