Scaling laws for language encoding models in fMRI
Authors: Richard Antonello, Aditya Vaidya, Alexander Huth
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we test whether larger open-source models such as those from the OPT and LLa MA families are better at predicting brain responses recorded using f MRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with 15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. |
| Researcher Affiliation | Academia | Richard J. Antonello Department of Computer Science The University of Texas at Austin rjantonello@utexas.edu Aditya R. Vaidya Department of Computer Science The University of Texas at Austin avaidya@utexas.edu Alexander G. Huth Departments of Computer Science and Neuroscience The University of Texas at Austin huth@cs.utexas.edu |
| Pseudocode | No | The paper describes methodological steps and formulas but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code. |
| Open Source Code | Yes | We have released code as well as selected precomputed features, model weights, and model predictions generated for this paper. These data are available at https://github.com/Huth Lab/encoding-model-scaling-laws. |
| Open Datasets | Yes | We used publicly available functional magnetic resonance imaging (f MRI) data collected from 3 human subjects as they listened to 20 hours of English language podcast stories over Sensimetrics S14 headphones [43, 44]. |
| Dataset Splits | Yes | For every even-numbered non-embedding layer l in the Whisper model, as well as the 18th layer of the 33 billion LLa MA model, we held-out 20% of the training data and built an encoding model using the remaining 80% of the training data. This was repeated for each of 5 folds. |
| Hardware Specification | Yes | Ridge regression was performed using compute nodes with 128 cores (2 AMD EPYC 7763 64-core processors) and 256GB of RAM. ... Feature extraction from language and speech models was performed on specialized GPU nodes that were the same as the previously-described compute nodes but with 3 NVIDIA A100 40GB cards. |
| Software Dependencies | No | The paper mentions a 'quadratic program solver [47]' but does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used for the experiments. |
| Experiment Setup | Yes | First, activations for each word in the stimulus text were extracted from each layer of each LM. ... We use time delays of 2, 4, 6, and 8 seconds of the representation to generate this temporal transformation. ... For a given story, contexts were grown until they reached 512 tokens, then reset to a new context of 256 tokens. |