reproducibilityindex.ai

Conceptual Reinforcement Learning for Language-Conditioned Tasks

Authors: Shaohui Peng, Xing Hu, Rui Zhang, Jiaming Guo, Qi Yi, Ruizhi Chen, Zidong Du, Ling Li, Qi Guo, Yunji Chen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics. To verify the performance of CRL, we evaluate the framework on two challenging benchmarks, RTFM and Messenger.
Researcher Affiliation	Collaboration	Shaohui Peng1, 2, 3, Xing Hu1, Rui Zhang1, 3, Jiaming Guo1, 2, 3, Qi Yi1, 3, 4, Ruizhi Chen2, 5, Zidong Du1, 3, Ling Li2, 5,, Qi Guo1, Yunji Chen1, 2, 1 SKL of Processors, Institute of Computing Technology, CAS 2 University of Chinese Academy of Sciences 3 Cambricon Technologies 4 University of Science and Technology of China 5 SKL of Computer Science, Institute of Software, CAS
Pseudocode	No	The paper describes the model architecture and mathematical formulations, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We verify CRL on two challenge environments with textual descriptions, RTFM (Zhong et al. 2020) and Messenger (Hanjie et al. 2021), both of which are benchmarks to evaluate the generalization ability of language-conditioned policy to new environment dynamics.
Dataset Splits	Yes	RTFM has a train set of environment dynamics (including entities and role assignments) and an independently identically distribution (i.i.d.) held-out test set. The Messenger offers three difficulty stages, including only message acquiring or delivering (S1), both acquiring and delivering (S2), and adding decoy entities and irrelevant descriptions (S3).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions several components like MLP, GRU, CLUB, and Deep VIB but does not specify the version numbers of any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Table 1: RTFM results in different settings. All results get from 5 random seeds. LCRL(θ) = LRL(θ) + α1LCLUB(θ) + α2LV IB(θ), where LRL(θ) is the original RL objective, and coefficients α1, α2 are hyperparameters. (Details are in Appendix B). The details of the environment and the CRL implementation are shown in Appendix A and B.