Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
Authors: Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvari, Mengdi Wang
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments with a mountain car example. |
| Researcher Affiliation | Collaboration | 1Deepmind 2Princeton University 3University of Alberta. |
| Pseudocode | Yes | The pseudocode is given as Algorithm 1. In the last step, m = N samples are used to produce the final output to guarantee that the error introduced by the Monte-Carlo averaging is negligible compared to the rest. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | No | We conducted experiments with a mountain car example. We use 800 radial basis functions for linear value function approximation. The number of episodes collected by behavior policies ranges from 2 to 100. |
| Dataset Splits | No | The paper mentions that the dataset D is split into T nonoverlapping folds D1, . . . , DT for the algorithm, but does not specify standard training, validation, and test dataset splits with explicit percentages or sample counts for reproducing the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For each algorithm we report the performance for the best regularization parameter λ in the range {0.02, 0.05, 0.1, 0.2, 0.5}. |