Divergences between Language Models and Human Brains
Authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Michael Tarr, Leila Wehbe
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. |
| Researcher Affiliation | Academia | Yuchen Zhou Emmy Liu Graham Neubig Michael J. Tarr Leila Wehbe Carnegie Mellon University {zhouyuchen,emmy,gneubig,michaeltarr,lwehbe}@cmu.edu |
| Pseudocode | Yes | Algorithm 1 Permutation test (for one channel, one time window) |
| Open Source Code | Yes | Data and code are available at: https://github.com/FlamingoZh/divergence_MEG |
| Open Datasets | Yes | The first dataset [Wehbe et al., 2014a] has eight participants reading Chapter 9 of Harry Potter and the Sorcerer s Stone (5,176 words) and four participants reading Chapter 10 of the same book (4,475 words). ... To enhance reproducibility and generalizability of our study, we additionally collected MEG data from one participant who listened to six narratives (11,626 words) from The Moth, a platform featuring personal storytelling. These stories were chosen from the stimuli used in a published story listening f MRI dataset [Le Bel et al., 2023]. |
| Dataset Splits | Yes | Therefore, we implemented a 10-fold crossvalidation procedure that splits the MEG data into 10 continuous chunks. ... The regularization parameters were chosen via nested cross-validation. |
| Hardware Specification | Yes | GPT-2 XL was trained separately on each of the two datasets in subsection 5.1 on 4 A6000 GPUs with 16-bit quantization and a batch size of 1 per GPU. |
| Software Dependencies | No | The paper mentions 'Deepspeed with Ze Ro stage 2 optimization' and 'The Adam optimizer' but does not specify version numbers for these or other software components like Python or deep learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | The Adam optimizer was used with a learning rate of 1e-5, betas of (0.9, 0.999), epsilon of 1e-8, and no weight decay. Models were trained with early stopping with a patience of 3 [Kingma and Ba, 2017]. |