Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code

Authors: Raghu Vamshi Hemadri, Jitendra Bhandari, Andre Nakkab, Johann Knechtel, Badri Gopalan, Ramesh Narayanaswamy, Ramesh Karri, Siddharth Garg

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We begin by discussing Veri Lo C s performance on line-level classification for both congestion and timing prediction. Table 2 tabulates our results for three different classification heads, as well as different context lengths (p = {1, 3, 5}). For congestion and timing classification tasks, we use the F1-score, precision, and recall to measure the balance between sensitivity and specificity. For the regression task of WNS prediction, we employ R² and mean absolute percentage error (MAPE)
Researcher Affiliation	Collaboration	Raghu Vamshi Hemadri1 Jitendra Bhandari1 Andre Nakkab1 Johann Knechtel2 Badri P Gopalan3 Ramesh Narayanaswamy3 Ramesh Karri1 Siddharth Garg1 1New York University Tandon School of Engineering 2New York University Abu Dhabi 3Synopsys
Pseudocode	No	The paper describes its methodology using text and diagrams (e.g., Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code.
Open Source Code	Yes	Overall, Veri Lo C4 establishes an entirely new approach for early-stage prediction from RTL code, which might be of value not only to other prediction tasks, but also for code and design optimization. 4https://github.com/ML4EDA/Veri Lo C.git and The code required to reproduce the results has been open-sourced and is accessible at: https://github.com/ML4EDA/Veri Lo C.git.
Open Datasets	Yes	We use the popular Open ABCD [18] RTL/Verilog code dataset for our experiments, using various Verilog modules from all projects in the dataset. 18 A. B. Chowdhury, B. Tan, R. Karri, and S. Garg, Openabc-d: A large-scale dataset for machine learning guided integrated circuit synthesis, 2021.
Dataset Splits	Yes	We employed an 80/20 random split of the dataset to obtain training vs. test data.
Hardware Specification	Yes	Hardware. CL-Verilog feature extraction was performed on a single NVidia H100, and downstream classifiers (XGBoost and Light GBM) were trained/evaluated on a CPU machine with 32GB RAM and 8 CPU cores. The FNN model was trained/evaluated using an NVidia RTX 8000 GPU.
Software Dependencies	No	The paper mentions specific software frameworks like XGBoost [19], Light GBM [20], CL-Verilog [14], and Synopsys RTL Architect [13], but it does not provide specific version numbers for these or any other ancillary software components required to replicate the experiments.
Experiment Setup	Yes	Hyperparameter Setting. For congestion and timing detection, we employed XGBoost [19], Light GBM [20], and an FNN, each tuned to handle class imbalance and optimize predictive performance. XGBoost was configured with default hyperparameters: scale_pos_weight set as the ratio of the majority to minority class to mitigate imbalance, max_depth=30, learning_rate=0.05 and n_estimators=500. Light GBM used is_unbalance=True for automatic class weight adjustment, and default settings of num_leaves=100, learning_rate=0.05, and feature_fraction=0.8. The regression head followed a similar training procedure using the XGBRegressor with a squared error loss. The FNN consisted of a single sigmoid neuron trained with binary cross-entropy (BCE) loss and was optimized using Adam with a learning rate of 1e 4.