Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes
Authors: Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our proposed method not only provides a viable mechanism to uncover latent patterns in decision processes, but also obtains state-of-the-art performance in these text games. In this section, we use two text games from [11] to evaluate our proposed model and demonstrate the idea of interpreting the decision making processes: (i) Saving John and (ii) Machine of Death. Table 1 summarize the means and standard deviations of the rewards on the two games. |
| Researcher Affiliation | Industry | Microsoft Research, Redmond, WA, USA EMAIL Google Inc., Kirkland, WA, USA EMAIL Citadel LLC, Seattle/Chicago, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 The training algorithm by mirror descent back propagation. Algorithm 2 The recursive MAP inference for one episode. |
| Open Source Code | No | The paper does not explicitly state that the authors' implementation code for Q-LDA is open-sourced or provide a link to it. Footnote 5 refers to the simulators (text games) which are third-party. |
| Open Datasets | Yes | In this section, we use two text games from [11] to evaluate our proposed model and demonstrate the idea of interpreting the decision making processes: (i) Saving John and (ii) Machine of Death (see Appendix C for a brief introduction of the two games). The simulators are obtained from https://github.com/jvking/text-games |
| Dataset Splits | No | The paper describes data collection for experience replay and states that results are not evaluated on the training dataset, but it does not specify explicit train/validation/test splits or percentages for the datasets used. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For example, at each m-th experience-replay learning (see Algorithm 1), we use the softmax action selection rule [21, pp.30 31] as the exploration policy to collect data (see Appendix E.3 for more details). We collect M = 200 episodes of data (about 3K time steps in Saving John and 16K in Machine of Death ) at each of D = 20 experience replays, which amounts to a total of 4, 000 episodes. At each experience replay, we update the model with 10 epochs before the next replay. |