Towards Automated RISC-V Microarchitecture Design with Reinforcement Learning

Authors: Chen Bai, Jianwang Zhai, Yuzhe Ma, Bei Yu, Martin D. F. Wong

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experiments using commercial electronic design automation (EDA) tools show that our method achieves an average PPA trade-off improvement of 16.03% than previous state-of-the-art approaches with 4.07 higher efficiency. The solution qualities outperform human implementations by at most 2.03 in the PPA trade-off.
Researcher Affiliation Academia Chen Bai1, Jianwang Zhai2 , Yuzhe Ma3, Bei Yu1 , Martin D.F. Wong4 1The Chinese University of Hong Kong 2Beijing University of Posts and Telecommunications 3The Hong Kong University of Science and Technology (Guangzhou) 4Hong Kong Baptist University
Pseudocode No The paper describes the methodology using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at https://github.com/baichen318/rl-explorer.
Open Datasets No We use towers, vvadd, spmv from official RISC-V tests as workloads in the DSE.
Dataset Splits No The paper mentions the use of training data for PPA model calibration but does not explicitly describe train/validation/test splits for the RL agent's DSE task. It states: 'By leveraging around 800 900 Sonic BOOM microarchitecture designs, the Kendall τ for PPA modeling results can achieve higher than 0.92.'
Hardware Specification Yes All experiments are conducted on 80 Quad Intel(R) Xeon(R) CPU E7-4820 V3 cores with a 1 TB main memory.
Software Dependencies Yes Specifically, the performance, power, and area values are obtained from Synopsys VCS M2017.03, Synopsys Prime Time PX R-2020.09-SP1, and Cadence Genus 18.12-e012 1 with 7-nm technology (Clark et al. 2016).
Experiment Setup Yes The coefficient κ in Equation (5) is set as 1, ρ in Equation (6) is 0.5, λ in Equation (7) is 0.95 and the discount factor ζ in Equation (7) is 0.99. Adam optimizer is used, and the initial learning rate is 0.001.