Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Authors: Tianhao Wang, Dongruo Zhou, Quanquan Gu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6 we present the numerical experiment which supports our theory.
Researcher Affiliation Academia Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 tianhao.wang@yale.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 LSVI-UCB-Batch
Open Source Code No (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets No We run our algorithms, LSVI-UCB-Batch and LSVI-UCB-Rare Switch, on a synthetic linear MDP given in Example 6.1, and compare them with the fully adaptive baseline, LSVI-UCB (Jin et al., 2020).
Dataset Splits No The paper uses a synthetic MDP and evaluates performance using regret over episodes; it does not describe dataset splits like training, validation, or test sets.
Hardware Specification Yes All experiments are performed on a PC with Intel i7-9700K CPU.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes In our experiment, we set H = 10, K = 2500, δ = 0.35 and d = 13, thus A contains 1024 actions. [...] In detail, for LSVI-UCB-Batch, we run the algorithm for B = 10, 20, 30, 40, 50 respectively; for LSVI-UCB-Rare Switch, we set η = 2, 4, 8, 16, 32.