Parametrized Quantum Policies for Reinforcement Learning
Authors: Sofiene Jerbi, Casper Gyurik, Simon Marshall, Hans Briegel, Vedran Dunjko
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a hybrid quantum-classical reinforcement learning model using very few qubits, which we show can be effectively trained to solve several standard benchmarking environments. In this section, we evaluate the influence of these design choices on learning performance through numerical simulations. We consider three classical benchmarking environments from the Open AI Gym library [24]: Cart Pole, Mountain Car and Acrobot. |
| Researcher Affiliation | Academia | Sofiene Jerbi Institute for Theoretical Physics, University of Innsbruck sofiene.jerbi@uibk.ac.at Casper Gyurik LIACS, Leiden University Simon C. Marshall LIACS, Leiden University Hans J. Briegel Institute for Theoretical Physics, University of Innsbruck Vedran Dunjko LIACS, Leiden University |
| Pseudocode | Yes | Algorithm 1: REINFORCE with PQC policies and value-function baselines |
| Open Source Code | Yes | Code An accompanying tutorial [36], implemented as part of the quantum machine learning library Tensor Flow Quantum [37], provides the code required to reproduce our numerical results and explore different settings. |
| Open Datasets | Yes | We consider three classical benchmarking environments from the Open AI Gym library [24]: Cart Pole, Mountain Car and Acrobot. |
| Dataset Splits | No | The paper describes training and evaluating agents in reinforcement learning environments by generating episodes, but it does not specify explicit validation dataset splits (e.g., percentages or counts for a separate validation set) as typically found in supervised learning. |
| Hardware Specification | No | The paper mentions: "The computational results presented here have been achieved in part using the LEO HPC infrastructure of the University of Innsbruck." and in Appendix D.1: "Our numerical simulations were performed on a CPU-based cluster." These statements indicate the type of computing environment but do not provide specific details such as CPU/GPU models, memory, or number of cores. |
| Software Dependencies | No | The paper mentions the use of "Tensor Flow Quantum [37]" but does not specify its version number or list other software dependencies with their specific versions. |
| Experiment Setup | Yes | Apart from the PQC depth, the shared hyperparameters of these two models were jointly picked as to give the best overall performance for both; the hyperparameters specific to each model were optimized independently. In our hyperparameter search, we evaluated the performance of DNNs with a wide range of depths (number of hidden layers between 2 to 10) and widths (number of units per hidden layer between 8 and 64), and kept the architecture with the best average performance (depth 4, width 16). |