reproducibilityindex.ai

Incremental Stochastic Factorization for Online Reinforcement Learning

Authors: Andre Barreto, Rafael Beirigo, Joelle Pineau, Doina Precup

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results support the utility of the proposed algorithm. In this section we use computational experiments to illustrate some of the properties of EMSF. Since PLSA/NMF algorithms similar to EMSF have already been submitted to extensive empirical analysis (Wang and Zhang 2013), we focus on illustrating characteristics that are speciﬁc to EMSF.
Researcher Affiliation	Academia	Andr e M. S. Barreto and Rafael L. Beirigo Laborat orio Nacional de Computac ao Cient ıﬁca Petr opolis, RJ, Brazil {amsb, rafaelb}@lncc.br Joelle Pineau and Doina Precup School of Computer Science, Mc Gill University Montreal, QC, Canada {jpineau, dprecup}@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1 Incremental EMSF
Open Source Code	No	The paper does not provide any concrete access to source code (e.g., repository links, explicit code release statements, or code in supplementary materials) for the methodology described.
Open Datasets	No	The paper mentions collecting 'transitions sampled from P' and using 'the game of blackjack' for experiments, but does not provide specific links, DOIs, repository names, or formal citations with authors/year for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information for validation (e.g., exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	At each iteration the algorithm gets a new sample transition, either from a ﬁnite dataset or from direct interaction with the MDP. The transition is then added to the appropriate matrix of countings Ca and discarded. If the sum of nonzero elements in the matrices Ca reaches a certain limit ηmax, deﬁned according to the amount of memory available, the statistics stored in Ca are used to compute the updates to Da and Ka, which are accumulated in the auxiliary matrices ˆDa and ˆKa, as in (13). At every tc iterations the modiﬁcations in ˆDa and ˆKa are committed to Da and Ka. (...) Both algorithms used m = 10; EMSF was run with tc = τ − 1 and α = 1. (...) n = 100, m = 30 (...) Q-learning using a learning rate of 0.1 (this rate resulted in the best performance among the values {0.01, 0.1, 0.3}). (...) EMSF was run with tc = 100 episodes and ηmax = .