reproducibilityindex.ai

Divergences between Language Models and Human Brains

Authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Michael Tarr, Leila Wehbe

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories.
Researcher Affiliation	Academia	Yuchen Zhou Emmy Liu Graham Neubig Michael J. Tarr Leila Wehbe Carnegie Mellon University {zhouyuchen,emmy,gneubig,michaeltarr,lwehbe}@cmu.edu
Pseudocode	Yes	Algorithm 1 Permutation test (for one channel, one time window)
Open Source Code	Yes	Data and code are available at: https://github.com/FlamingoZh/divergence_MEG
Open Datasets	Yes	The first dataset [Wehbe et al., 2014a] has eight participants reading Chapter 9 of Harry Potter and the Sorcerer s Stone (5,176 words) and four participants reading Chapter 10 of the same book (4,475 words). ... To enhance reproducibility and generalizability of our study, we additionally collected MEG data from one participant who listened to six narratives (11,626 words) from The Moth, a platform featuring personal storytelling. These stories were chosen from the stimuli used in a published story listening f MRI dataset [Le Bel et al., 2023].
Dataset Splits	Yes	Therefore, we implemented a 10-fold crossvalidation procedure that splits the MEG data into 10 continuous chunks. ... The regularization parameters were chosen via nested cross-validation.
Hardware Specification	Yes	GPT-2 XL was trained separately on each of the two datasets in subsection 5.1 on 4 A6000 GPUs with 16-bit quantization and a batch size of 1 per GPU.
Software Dependencies	No	The paper mentions 'Deepspeed with Ze Ro stage 2 optimization' and 'The Adam optimizer' but does not specify version numbers for these or other software components like Python or deep learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	The Adam optimizer was used with a learning rate of 1e-5, betas of (0.9, 0.999), epsilon of 1e-8, and no weight decay. Models were trained with early stopping with a patience of 3 [Kingma and Ba, 2017].