Sparse Multi-Task Reinforcement Learning

Authors: Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the empirical performance of GL FQI, and FL FQI and compare their results to single-task LASSO FQI in two variants of the blackjack game.
Researcher Affiliation Academia Daniele Calandriello Alessandro Lazaric Team Seque L INRIA Lille Nord Europe, France Marcello Restelli DEIB Politecnico di Milano, Italy
Pseudocode Yes Figure 1: Linear FQI with fixed design and fresh samples at each iteration in a multi-task setting.
Open Source Code No The paper does not provide any specific statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes generating tasks for a blackjack game simulation ('The tasks are generated by selecting 2, 4, 6, 8 decks...'). It does not provide access information for a publicly available dataset.
Dataset Splits No The paper mentions collecting 'samples from up to 5000 episodes' for learning and simulating 'the learned policy for 2,000,000 episodes' for evaluation, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not specify any software dependencies or their version numbers used in the experiments.
Experiment Setup Yes For each algorithm we report the performance for the best regularization parameter λ in the range {2, 5, 10, 20, 50}. We used regularizers in the range {0.1, 1, 2, 5, 10}. The tasks are generated by selecting 2, 4, 6, 8 decks, by setting the stay threshold at {16, 17} and whether the dealer hits on soft, for a total of 16 tasks.