Incremental Stochastic Factorization for Online Reinforcement Learning

Authors: Andre Barreto, Rafael Beirigo, Joelle Pineau, Doina Precup

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results support the utility of the proposed algorithm. In this section we use computational experiments to illustrate some of the properties of EMSF. Since PLSA/NMF algorithms similar to EMSF have already been submitted to extensive empirical analysis (Wang and Zhang 2013), we focus on illustrating characteristics that are specific to EMSF.
Researcher Affiliation Academia Andr e M. S. Barreto and Rafael L. Beirigo Laborat orio Nacional de Computac ao Cient ıfica Petr opolis, RJ, Brazil {amsb, rafaelb}@lncc.br Joelle Pineau and Doina Precup School of Computer Science, Mc Gill University Montreal, QC, Canada {jpineau, dprecup}@cs.mcgill.ca
Pseudocode Yes Algorithm 1 Incremental EMSF
Open Source Code No The paper does not provide any concrete access to source code (e.g., repository links, explicit code release statements, or code in supplementary materials) for the methodology described.
Open Datasets No The paper mentions collecting 'transitions sampled from P' and using 'the game of blackjack' for experiments, but does not provide specific links, DOIs, repository names, or formal citations with authors/year for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information for validation (e.g., exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes At each iteration the algorithm gets a new sample transition, either from a finite dataset or from direct interaction with the MDP. The transition is then added to the appropriate matrix of countings Ca and discarded. If the sum of nonzero elements in the matrices Ca reaches a certain limit ηmax, defined according to the amount of memory available, the statistics stored in Ca are used to compute the updates to Da and Ka, which are accumulated in the auxiliary matrices ˆDa and ˆKa, as in (13). At every tc iterations the modifications in ˆDa and ˆKa are committed to Da and Ka. (...) Both algorithms used m = 10; EMSF was run with tc = τ − 1 and α = 1. (...) n = 100, m = 30 (...) Q-learning using a learning rate of 0.1 (this rate resulted in the best performance among the values {0.01, 0.1, 0.3}). (...) EMSF was run with tc = 100 episodes and ηmax = .