Reinforcement learning for optimization of variational quantum circuit architectures
Authors: Mateusz Ostaszewski, Lea M. Trenkwalder, Wojciech Masarczyk, Eleanor Scerri, Vedran Dunjko
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the performance of our algorithm on the problem of estimating the ground-state energy of lithium hydride (Li H) in various configurations. In this well-known benchmark problem, we achieve chemical accuracy and state-of-the-art results in terms of circuit depth. 4 Experiments |
| Researcher Affiliation | Academia | Mateusz Ostaszewski, Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland. mm.ostaszewski@gmail.com Lea M. Trenkwalder, Institute for Theoretical Physics, University of Innsbruck Innsbruck, Austria lea.trenkwalder@uibk.ac.at Wojciech Masarczyk, Warsaw University of Technology, Warsaw, Poland. wojciech.masarczyk@gmail.com Eleanor Scerri, Leiden University, Leiden, The Netherlands. scerri@lorentz.leidenuniv.nl Vedran Dunjko, Leiden University, Leiden, The Netherlands. v.dunjko@liacs.leidenuniv.nl |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2The code is available on https://github.com/mostaszewski314/RL_for_optimization_of_VQE_ circuit_architectures |
| Open Datasets | Yes | All Hamiltonians were generated using the Qiskit library [33]. |
| Dataset Splits | No | The paper describes the reinforcement learning training process, including "training episode" and "testing phase", but does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets) as it generates data dynamically within the RL environment. |
| Hardware Specification | Yes | All experiments were performed on three computing clusters 4 Titan RTX GPUs, 4 Titan V GPUs, and 4 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions the use of "Qiskit library [33]" and "Qulacs library [35]", but it does not specify their version numbers, which are required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | In all experiments we utilize n-step DDQN algorithm, with the discount factor set to γ = 0.88, and the probability of a random action being selected is set by an ε greedy policy, with ε decayed in each step by a factor of 0.99995 from its initial value ϵ = 1, down to a minimal value ε = 0.05. The memory replay buffer size is set to 20,000. The target network in the DDQN training procedure is updated after every 500 actions. The employed network is a fully connected network with 5 hidden layers with 1000 neurons each for the 4-qubit case and 2000 neurons each for the 6-qubit case. The maximal number of gates is equal to 40. |