Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
Authors: Dongruo Zhou, Quanquan Gu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct some numerical experiments to suggest the validity of HF-UCRL-VTR+ in Appendix A. |
| Researcher Affiliation | Academia | Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Weighted OFUL+ Algorithm 2 HF-UCRL-VTR+ Algorithm 3 High-order moment estimator (HOME) |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Open Datasets | No | The paper does not specify the use of any publicly available datasets or provide access information for data used in numerical experiments. |
| Dataset Splits | No | The paper does not provide explicit training, validation, or test dataset splits. The ethics review section states: "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]" |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. The ethics review section states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]" |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers needed to replicate the experiment. The ethics review section states: "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]" |
| Experiment Setup | No | The paper states that for numerical experiments "the parameter B in the MDP is 1, d = 4... the regularization parameter λ = 0.01 and α = 0.001." However, it does not provide comprehensive training hyperparameters such as learning rate, batch size, or optimizer settings. The ethics review section states: "Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]" |