An Expectation-Maximization Algorithm to Compute a Stochastic Factorization From Data

Authors: Andre M. S. Barreto, Rafael L. Beirigo, Joelle Pineau, Doina Precup

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present computational experiments to illustrate the properties and usefulness of our algorithm. The first experiment is a proof of concept : we generated transition matrices P R100 100, with srk(P) = 20, and tried to recover them with EMSF using different values for m. We then compared EMSF s results with those obtained by directly estimating P via maximum likelihood (referred to in the plots as CNT, for counting ).
Researcher Affiliation Academia Andr e M. S. Barreto and Rafael L. Beirigo Laborat orio Nacional de Computac ao Cient ıfica Petr opolis, RJ, Brazil {amsb,rafaelb}@lncc.br Joelle Pineau and Doina Precup Mc Gill University Montreal, QC, Canada {jpineau,dprecup}@cs.mcgill.ca
Pseudocode Yes Algorithm 1 shows a step by step description of the proposed method, which we call EMSF. The pseudo-code uses two matrices to represent the forward and backward variables: α Rτ 1 m, where αti = ˆαi(t), and β Rτ 1 m, where βti = ˆβi(t). For each sequence zl 1:τl, EMSF first computes α using (5) and then computes β using (8). Then, α and β are multiplied element-wise, giving rise to matrix C Rτ 1 m.
Open Source Code No The paper does not provide any explicit statements about making source code available or links to a code repository.
Open Datasets Yes The game of blackjack was implemented exactly as described in Section 5.1 of Sutton and Barto s [1998] book.
Dataset Splits No The paper does not explicitly specify dataset splits for training, validation, and testing or reference a validation set.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes The first experiment is a proof of concept : we generated transition matrices P R100 100, with srk(P) = 20, and tried to recover them with EMSF using different values for m. ... For each value of τ, c = 10 trajectories were generated per run. ... The game of blackjack was implemented exactly as described in Section 5.1 of Sutton and Barto s [1998] book. The experiments were carried out as follows. First, an exploration policy that selects actions uniformly at random played a number of hands of blackjack.