Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

Authors: Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvari, Mengdi Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments with a mountain car example.
Researcher Affiliation Collaboration 1Deepmind 2Princeton University 3University of Alberta.
Pseudocode Yes The pseudocode is given as Algorithm 1. In the last step, m = N samples are used to produce the final output to guarantee that the error introduced by the Monte-Carlo averaging is negligible compared to the rest.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets No We conducted experiments with a mountain car example. We use 800 radial basis functions for linear value function approximation. The number of episodes collected by behavior policies ranges from 2 to 100.
Dataset Splits No The paper mentions that the dataset D is split into T nonoverlapping folds D1, . . . , DT for the algorithm, but does not specify standard training, validation, and test dataset splits with explicit percentages or sample counts for reproducing the experiment.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For each algorithm we report the performance for the best regularization parameter λ in the range {0.02, 0.05, 0.1, 0.2, 0.5}.