Monte-Carlo Planning and Learning with Language Action Value Estimates
Authors: Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we demonstrate that our method achieves new high scores in various IF games. |
| Researcher Affiliation | Academia | Youngsoo Jang1, Seokin Seo2, Jongmin Lee1, Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Graduate School of AI, KAIST, Daejeon, Republic of Korea |
| Pseudocode | Yes | Appendix D PSEUDOCODE OF MC-LAVE and Algorithm 1 Monte-Carlo Planning with Language Action Value Estimates (MC-LAVE) |
| Open Source Code | Yes | Our code is publicly available2. 2https://github.com/jys5609/MC-LAVE-RL |
| Open Datasets | Yes | In this section, we show experimental results of our approach on IF games included in the Jericho environment (Hausknecht et al., 2020). |
| Dataset Splits | No | No explicit training/validation/test dataset splits with percentages or sample counts are provided, as the paper describes a reinforcement learning setup where data is generated through interaction. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions the "Jericho framework" but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Appendix B EXPERIMENTS DETAILS and Table 4: Configurations of MC-LAVE-RL used in our experimental results. Hyperparameters in the upside of the table were globally adapted in the planning-learning framework and the other hyperparameters are used only in the MCTS planning phase. |