Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
Authors: Tianhao Wang, Dongruo Zhou, Quanquan Gu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6 we present the numerical experiment which supports our theory. |
| Researcher Affiliation | Academia | Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 tianhao.wang@yale.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 LSVI-UCB-Batch |
| Open Source Code | No | (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | No | We run our algorithms, LSVI-UCB-Batch and LSVI-UCB-Rare Switch, on a synthetic linear MDP given in Example 6.1, and compare them with the fully adaptive baseline, LSVI-UCB (Jin et al., 2020). |
| Dataset Splits | No | The paper uses a synthetic MDP and evaluates performance using regret over episodes; it does not describe dataset splits like training, validation, or test sets. |
| Hardware Specification | Yes | All experiments are performed on a PC with Intel i7-9700K CPU. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | In our experiment, we set H = 10, K = 2500, δ = 0.35 and d = 13, thus A contains 1024 actions. [...] In detail, for LSVI-UCB-Batch, we run the algorithm for B = 10, 20, 30, 40, 50 respectively; for LSVI-UCB-Rare Switch, we set η = 2, 4, 8, 16, 32. |