Quantum algorithms for reinforcement learning with a generative model

Authors: Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, Martin Roetteler

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For such an MDP, we design quantum algorithms that approximate an optimal policy (π ), the optimal value function (v ), and the optimal Q-function (q ), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy (ϵ) and two main parameters of the MDP: the effective time horizon ( 1 1 γ ) and the size of the action space (A). Moreover, we show that our quantum algorithm for computing q is optimal by proving a matching quantum lower bound.
Researcher Affiliation Collaboration 1University of Maryland 2Microsoft Quantum 3Microsoft. Correspondence to: Daochen Wang <wdaochen@gmail.com>, Aarthi Sundaram <aarthi.sundaram@microsoft.com>.
Pseudocode Yes Algorithm 1 Solve Mdp1(M, ϵ, δ) and Algorithm 2 Solve Mdp2(M, ϵ, δ) are provided.
Open Source Code No The paper does not provide any specific links or statements about the availability of open-source code for the methodology described.
Open Datasets No The paper assumes a 'generative model of sampling' where a simulator draws samples, rather than using a publicly available or open dataset for training experiments.
Dataset Splits No The paper does not discuss dataset splits for training, validation, or testing, as it focuses on theoretical algorithm design and analysis using a generative model.
Hardware Specification No The paper focuses on theoretical quantum algorithms and their sample complexity. It does not provide specific hardware details (like GPU/CPU models or processor types) used for running experiments.
Software Dependencies No The paper describes theoretical algorithms and their complexity. It does not provide specific ancillary software details with version numbers (e.g., libraries, solvers) needed to replicate an experiment.
Experiment Setup No The paper focuses on theoretical algorithm design and analysis. While it defines input parameters for its algorithms (e.g., epsilon, delta), it does not describe an experimental setup with specific hyperparameters or system-level training settings as would be found in empirical work.