Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Authors: Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvari, Mengdi Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments with a mountain car example. |
| Researcher Affiliation | Collaboration | 1Deepmind 2Princeton University 3University of Alberta. |
| Pseudocode | Yes | The pseudocode is given as Algorithm 1. In the last step, m = N samples are used to produce the final output to guarantee that the error introduced by the Monte-Carlo averaging is negligible compared to the rest. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | No | We conducted experiments with a mountain car example. We use 800 radial basis functions for linear value function approximation. The number of episodes collected by behavior policies ranges from 2 to 100. |
| Dataset Splits | No | The paper mentions that the dataset D is split into T nonoverlapping folds D1, . . . , DT for the algorithm, but does not specify standard training, validation, and test dataset splits with explicit percentages or sample counts for reproducing the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For each algorithm we report the performance for the best regularization parameter λ in the range {0.02, 0.05, 0.1, 0.2, 0.5}. |