reproducibilityindex.ai

Reinforcement learning for optimization of variational quantum circuit architectures

Authors: Mateusz Ostaszewski, Lea M. Trenkwalder, Wojciech Masarczyk, Eleanor Scerri, Vedran Dunjko

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase the performance of our algorithm on the problem of estimating the ground-state energy of lithium hydride (Li H) in various conﬁgurations. In this well-known benchmark problem, we achieve chemical accuracy and state-of-the-art results in terms of circuit depth. 4 Experiments
Researcher Affiliation	Academia	Mateusz Ostaszewski, Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland. mm.ostaszewski@gmail.com Lea M. Trenkwalder, Institute for Theoretical Physics, University of Innsbruck Innsbruck, Austria lea.trenkwalder@uibk.ac.at Wojciech Masarczyk, Warsaw University of Technology, Warsaw, Poland. wojciech.masarczyk@gmail.com Eleanor Scerri, Leiden University, Leiden, The Netherlands. scerri@lorentz.leidenuniv.nl Vedran Dunjko, Leiden University, Leiden, The Netherlands. v.dunjko@liacs.leidenuniv.nl
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	2The code is available on https://github.com/mostaszewski314/RL_for_optimization_of_VQE_ circuit_architectures
Open Datasets	Yes	All Hamiltonians were generated using the Qiskit library [33].
Dataset Splits	No	The paper describes the reinforcement learning training process, including "training episode" and "testing phase", but does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets) as it generates data dynamically within the RL environment.
Hardware Specification	Yes	All experiments were performed on three computing clusters 4 Titan RTX GPUs, 4 Titan V GPUs, and 4 Tesla V100 GPUs.
Software Dependencies	No	The paper mentions the use of "Qiskit library [33]" and "Qulacs library [35]", but it does not specify their version numbers, which are required for a reproducible description of software dependencies.
Experiment Setup	Yes	In all experiments we utilize n-step DDQN algorithm, with the discount factor set to γ = 0.88, and the probability of a random action being selected is set by an ε greedy policy, with ε decayed in each step by a factor of 0.99995 from its initial value ϵ = 1, down to a minimal value ε = 0.05. The memory replay buffer size is set to 20,000. The target network in the DDQN training procedure is updated after every 500 actions. The employed network is a fully connected network with 5 hidden layers with 1000 neurons each for the 4-qubit case and 2000 neurons each for the 6-qubit case. The maximal number of gates is equal to 40.