Boosted Fitted Q-Iteration

Authors: Samuele Tosatto, Matteo Pirotta, Carlo D’Eramo, Marcello Restelli

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study B-FQI both theoretically, providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques. ... We empirically evaluate the behavior of FQI (Ernst et al., 2005), Neural FQI (Riedmiller, 2005) and B-FQI on two different MDPs. As regression models we consider extratrees (Geurts et al., 2006) and neural networks (Goodfellow et al., 2016).
Researcher Affiliation Academia 1Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milano, Italy, 2IAS, Darmstadt, Germany, 3Seque L Team, INRIA Lille Nord Europe.
Pseudocode Yes Algorithm 1 Boosted Fitted Q-Iteration
Open Source Code No The paper does not provide any concrete access to source code (e.g., a specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes The experiments are based on Open AI Gym implementation (Brockman et al., 2016) (version v0). ... We collected 10 datasets of 2000 episodes to average the results. ... We collected 20 datasets of up to 2000 episodes to average the results.
Dataset Splits No The paper mentions collecting datasets and averaging results, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts, nor does it reference predefined splits.
Hardware Specification No The paper does not provide any specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only refers to concepts like 'compute hardware' in a general sense.
Software Dependencies No The paper mentions using 'extra-trees (Geurts et al., 2006)' and 'neural networks (Goodfellow et al., 2016)' and 'Open AI Gym implementation (Brockman et al., 2016) (version v0)', but it does not provide specific version numbers for software dependencies beyond the Gym version.
Experiment Setup Yes The extra-tree ensemble is composed of 30 regressors with a minimum number of samples per split of 4 and a minimum number of samples per leaf of 2. The neural network has 1 hidden layer with sigmoidal activation and is trained using RMSProp (Goodfellow et al., 2016). ... The extra-tree ensemble is composed of 50 regressors with a minimum number of samples per split of 7 and a minimum number of samples per leaf of 4. ... the neural network has 2 hidden layers composed of 10 sigmoidal neurons. ... Episodes are truncated at 5000 steps.