Training language models to summarize narratives improves brain alignment
Authors: Khai Loong Aw, Mariya Toneva
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the alignment of the base and booksum models with f MRI recordings of 8 participants reading a chapter of a popular book wordby-word, made publicly available by Wehbe et al. (2014a).Our main contributions are as follows: 1. In Section 4, we show that training language models for deeper narrative understanding improves alignment to human brain activity. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems 2Singapore Management University |
| Pseudocode | Yes | Figure 3: Left. Interpretability approach to compare Pearson correlation brain alignment for f MRI samples corresponding to various discourse features. Right. Pearson correlation averages for three discourse features. Averages were computed over 8 layers for each model, sequence lengths 20 to 500, and all 8 participants. NLP models have greater brain alignment for Characters than other discourse features. When trained to summarize narratives, the models improve their brain alignment significantly for all discourse features (paired t-test, FDR corrected for multiple comparisons). However, it improves more for Characters than other discourse features. Note that the average correlations shown here are low in magnitude as they include a large number of brain voxels that may not be significantly involved in brain-NLP alignment or language processing, as well as many layers and sequence lengths. Algorithm 1 Interpretability approach to compare brain alignment across discourse features |
| Open Source Code | Yes | Code available at https://github.com/awwkl/brain language summarization. |
| Open Datasets | Yes | We use a publicly available brain dataset (Wehbe et al., 2014a) consisting of f MRI recordings of 8 participants reading chapter 9 of the book Harry Potter and the Sorceror s Stone (Rowling et al., 1998).We specifically investigate 4 pretrained language models (i.e., base models ) and 4 corresponding models obtained by training the base models on the Book Sum dataset (Kryscinski et al., 2021) to improve the base language model s narrative understanding (i.e., booksum models ). |
| Dataset Splits | Yes | Since the f MRI data was collected in 4 runs of approximately equal length, we use 4-fold cross-validation where each fold corresponds to holding out one run of f MRI data for testing.First, we split our Harry Potter text dataset into a train and test set (75% and 25% of the text respectively). |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or other compute infrastructure used for running the experiments. |
| Software Dependencies | No | The paper mentions using “Hugging Face” models but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or specific library versions. |
| Experiment Setup | Yes | We select the ridge parameter via nested cross-validation.First, we reduce the dimensionality of the word-level NLP representations R5176 d using PCA and retain the top 10 principle components (more than 75% of the variance) to result in a matrix R5176 10. |