Incremental Stochastic Factorization for Online Reinforcement Learning
Authors: Andre Barreto, Rafael Beirigo, Joelle Pineau, Doina Precup
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results support the utility of the proposed algorithm. In this section we use computational experiments to illustrate some of the properties of EMSF. Since PLSA/NMF algorithms similar to EMSF have already been submitted to extensive empirical analysis (Wang and Zhang 2013), we focus on illustrating characteristics that are specific to EMSF. |
| Researcher Affiliation | Academia | Andr e M. S. Barreto and Rafael L. Beirigo Laborat orio Nacional de Computac ao Cient ıfica Petr opolis, RJ, Brazil {amsb, rafaelb}@lncc.br Joelle Pineau and Doina Precup School of Computer Science, Mc Gill University Montreal, QC, Canada {jpineau, dprecup}@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1 Incremental EMSF |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., repository links, explicit code release statements, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper mentions collecting 'transitions sampled from P' and using 'the game of blackjack' for experiments, but does not provide specific links, DOIs, repository names, or formal citations with authors/year for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information for validation (e.g., exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | At each iteration the algorithm gets a new sample transition, either from a finite dataset or from direct interaction with the MDP. The transition is then added to the appropriate matrix of countings Ca and discarded. If the sum of nonzero elements in the matrices Ca reaches a certain limit ηmax, defined according to the amount of memory available, the statistics stored in Ca are used to compute the updates to Da and Ka, which are accumulated in the auxiliary matrices ˆDa and ˆKa, as in (13). At every tc iterations the modifications in ˆDa and ˆKa are committed to Da and Ka. (...) Both algorithms used m = 10; EMSF was run with tc = τ − 1 and α = 1. (...) n = 100, m = 30 (...) Q-learning using a learning rate of 0.1 (this rate resulted in the best performance among the values {0.01, 0.1, 0.3}). (...) EMSF was run with tc = 100 episodes and ηmax = . |