Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

Authors: Yaoyu Zhu, Di Huang, Hanqi Lyu, Xiaoyun Zhang, Chongxiao Li, Wenxuan Shi, Yutong Wu, Jianan Mu, Jinghua Wang, Yang zhao, Pengwei Jin, Shuyao Cheng, shengwen Liang, xishan zhang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section details the implementation of our method and presents comprehensive experimental results. We systematically evaluate our model through multiple dimensions: comparisons with prior state-of-the-art approaches, test-time scaling analysis across varying response length constraints, ablation studies analyzing the impact of golden code correctness and problem complexity, acceleration effects of the adaptive DAPO mechanism, and testbench performance evaluation. These analyses collectively demonstrate the effectiveness and efficiency of our proposed approach.
Researcher Affiliation Collaboration 1 State Key Lab of Processors, Institute of Computing Technology, CAS 2 University of Science and Technology of China 3 University of Chinese Academy of Sciences 4 Cambricon Technologies
Pseudocode Yes A.2 Algorithm Description of Adaptive DAPO In this section, we provide the algorithm description of adaptive DAPO in Algorithm 1. In this algorithm, one epoch means going through the whole training dataset, while one step is to collect enough samples and update the model parameters like standard DAPO [44].
Open Source Code Yes We have released our model, training code, and dataset to facilitate research in EDA and LLM communities.
Open Datasets Yes We have released our model, training code, and dataset to facilitate research in EDA and LLM communities.
Dataset Splits Yes Specifically, we conduct equivalence checking between the {y i} code generated by Deep Seek-R1 in the 87K dataset and {y i } in the original dataset, retaining only validated {(x i, y i )} pairs for RL training. ... Through this rigorous selection process, we curate a final dataset of 3.1K high-quality examples for reinforcement learning. For Verilog Eval v2, we examine zero-shot scenarios in both specification-to-RTL translation and code completion tasks.
Hardware Specification Yes The SFT stage is executed on 8 A100-80G GPUs, taking approximately 78 hours, while the RL stage runs on 16 A100-80G GPUs, requiring around 127 hours of computation.
Software Dependencies No During distillation, we employ LLaMAFactory [50] to supervised fine-tune (SFT) Qwen2.5-Coder-7B-Instruct using the 87K dataset filtered for distillation. We train the model for 6 epochs with a learning rate of 1e-5 and a batch size of 64. The total context length is set to 16384 during distillation. During RL, we use the verl [32] framework to further train the distilled model with our adaptive DAPO.
Experiment Setup Yes During distillation, we employ LLaMAFactory [50] to supervised fine-tune (SFT) Qwen2.5-Coder-7B-Instruct using the 87K dataset filtered for distillation. We train the model for 6 epochs with a learning rate of 1e-5 and a batch size of 64. The total context length is set to 16384 during distillation. During RL, we use the verl [32] framework to further train the distilled model with our adaptive DAPO. We use a batch size of 128, a learning rate of 1e-6, and train for 300 steps. The rollout temperature is set to 1.0. During this stage, the max length is set to 2048 for instruction and 16384 for response. The full parameter setting during the SFT (distillation) stage is shown in Table 3, while the full parameter setting during the RL stage is shown in Table 4.