Monte-Carlo Planning and Learning with Language Action Value Estimates

Authors: Youngsoo Jang, Seokin Seo, Jongmin Lee, Kee-Eung Kim

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we demonstrate that our method achieves new high scores in various IF games.
Researcher Affiliation Academia Youngsoo Jang1, Seokin Seo2, Jongmin Lee1, Kee-Eung Kim1,2 1School of Computing, KAIST, Daejeon, Republic of Korea 2Graduate School of AI, KAIST, Daejeon, Republic of Korea
Pseudocode Yes Appendix D PSEUDOCODE OF MC-LAVE and Algorithm 1 Monte-Carlo Planning with Language Action Value Estimates (MC-LAVE)
Open Source Code Yes Our code is publicly available2. 2https://github.com/jys5609/MC-LAVE-RL
Open Datasets Yes In this section, we show experimental results of our approach on IF games included in the Jericho environment (Hausknecht et al., 2020).
Dataset Splits No No explicit training/validation/test dataset splits with percentages or sample counts are provided, as the paper describes a reinforcement learning setup where data is generated through interaction.
Hardware Specification No No specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions the "Jericho framework" but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries.
Experiment Setup Yes Appendix B EXPERIMENTS DETAILS and Table 4: Configurations of MC-LAVE-RL used in our experimental results. Hyperparameters in the upside of the table were globally adapted in the planning-learning framework and the other hyperparameters are used only in the MCTS planning phase.