Language Models Meet World Models: Embodied Experiences Enhance Language Models

Authors: Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang , Zhiting Hu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our approach substantially improves base LMs on 18 downstream tasks by 64.28% on average.
Researcher Affiliation Academia UC San Diego, UIUC, MIT, JHU, CMU
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1The code is available at https://github.com/szxiangjn/world-model-for-language-model.
Open Datasets Yes We instantiate a world model using a virtual household simulator, Virtual Home [36, 37]... evaluate the perplexity on a subset of Pile [12] test set... For goal-oriented planning, we collected activities and their corresponding target goals with data from Robot How [36].
Dataset Splits No All the hyperparameters are chosen according to the performance on a held out set. The paper mentions using a 'held out set' for hyperparameter tuning but does not provide specific details about its size, percentage, or how it was derived from the main dataset.
Hardware Specification Yes We used one NVIDIA Ge Force RTX 3090 for training.
Software Dependencies No The paper mentions techniques like 'Int8 technique' and 'Adam W optimizer' but does not specify version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes For both GPT-Neo-1.3B and GPT-J-6B, we use a learning rate of 8 10 5 and a batch size of 20. The weights for plan generation, activity recognition, counting, and object path tracking are 1.0, 0.7, 1.0, and 1.0, respectively. We trained GPT-Neo-1.3B for 3 epochs with the EWC coefficient λ = 0.5 in Equation 4. For GPT-J-6B, we trained it for 5 epochs with λ = 2. With our approach, it takes 40 minutes to train a GPT-Neo and 220 minutes to train a GPT-J. We used a rank of 8 and coefficient of 32 for Lo RA s hyperparameters.