reproducibilityindex.ai

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Authors: Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang , Zhiting Hu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show our approach substantially improves base LMs on 18 downstream tasks by 64.28% on average.
Researcher Affiliation	Academia	UC San Diego, UIUC, MIT, JHU, CMU
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	1The code is available at https://github.com/szxiangjn/world-model-for-language-model.
Open Datasets	Yes	We instantiate a world model using a virtual household simulator, Virtual Home [36, 37]... evaluate the perplexity on a subset of Pile [12] test set... For goal-oriented planning, we collected activities and their corresponding target goals with data from Robot How [36].
Dataset Splits	No	All the hyperparameters are chosen according to the performance on a held out set. The paper mentions using a 'held out set' for hyperparameter tuning but does not provide specific details about its size, percentage, or how it was derived from the main dataset.
Hardware Specification	Yes	We used one NVIDIA Ge Force RTX 3090 for training.
Software Dependencies	No	The paper mentions techniques like 'Int8 technique' and 'Adam W optimizer' but does not specify version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	For both GPT-Neo-1.3B and GPT-J-6B, we use a learning rate of 8 10 5 and a batch size of 20. The weights for plan generation, activity recognition, counting, and object path tracking are 1.0, 0.7, 1.0, and 1.0, respectively. We trained GPT-Neo-1.3B for 3 epochs with the EWC coefficient λ = 0.5 in Equation 4. For GPT-J-6B, we trained it for 5 epochs with λ = 2. With our approach, it takes 40 minutes to train a GPT-Neo and 220 minutes to train a GPT-J. We used a rank of 8 and coefficient of 32 for Lo RA s hyperparameters.