reproducibilityindex.ai

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes

Authors: Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our proposed method not only provides a viable mechanism to uncover latent patterns in decision processes, but also obtains state-of-the-art performance in these text games. In this section, we use two text games from [11] to evaluate our proposed model and demonstrate the idea of interpreting the decision making processes: (i) Saving John and (ii) Machine of Death. Table 1 summarize the means and standard deviations of the rewards on the two games.
Researcher Affiliation	Industry	Microsoft Research, Redmond, WA, USA {jianshuc,lin.xiao}@microsoft.com Google Inc., Kirkland, WA, USA {chongw,lihong}@google.com Citadel LLC, Seattle/Chicago, USA {Ji.He,Li.Deng}@citadel.com
Pseudocode	Yes	Algorithm 1 The training algorithm by mirror descent back propagation. Algorithm 2 The recursive MAP inference for one episode.
Open Source Code	No	The paper does not explicitly state that the authors' implementation code for Q-LDA is open-sourced or provide a link to it. Footnote 5 refers to the simulators (text games) which are third-party.
Open Datasets	Yes	In this section, we use two text games from [11] to evaluate our proposed model and demonstrate the idea of interpreting the decision making processes: (i) Saving John and (ii) Machine of Death (see Appendix C for a brief introduction of the two games). The simulators are obtained from https://github.com/jvking/text-games
Dataset Splits	No	The paper describes data collection for experience replay and states that results are not evaluated on the training dataset, but it does not specify explicit train/validation/test splits or percentages for the datasets used.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For example, at each m-th experience-replay learning (see Algorithm 1), we use the softmax action selection rule [21, pp.30 31] as the exploration policy to collect data (see Appendix E.3 for more details). We collect M = 200 episodes of data (about 3K time steps in Saving John and 16K in Machine of Death ) at each of D = 20 experience replays, which amounts to a total of 4, 000 episodes. At each experience replay, we update the model with 10 epochs before the next replay.