Human alignment of neural network representations

Authors: Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen, Simon Kornblith

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses, whereas the training dataset and objective function both have a much larger impact. These findings are consistent across three datasets of human similarity judgments collected using two different tasks.
Researcher Affiliation Collaboration Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A. Vandermeulen Machine Learning Group, Technische Universit at Berlin & BIFOLD Berlin, Germany Simon Kornblith Google Research, Brain Team
Pseudocode Yes Algorithm 1 Algorithm for object partitioning during k-fold CV
Open Source Code No The paper mentions using 'thingsvision, a Python library for extracting activations from neural nets (Muttenthaler & Hebart, 2021)' and 'model weights from the VISSL library (Goyal et al., 2021)' or 'official Git Hub repositories', but does not explicitly state that the code for *their own experiments and analysis* is open-source.
Open Datasets Yes Our primary analyses use images and corresponding human odd-one-out triplet judgments from the THINGS dataset (Hebart et al., 2019). We additionally consider two datasets of images with human similarity judgments obtained from a multi-arrangement task (Cichy et al., 2019; King et al., 2019).
Dataset Splits Yes To obtain a minimally biased estimate of the odd-one-out accuracy of a linear probe, we partition the m objects into two disjoint sets. Experimental details about the optimization process, k-fold CV, and how we partition the objects can be found in Appendix A.1 and Algorithm 1 respectively. ... We decided to proceed with 3-fold CV in our final experiments since using 2/3 of the objects for training and 1/3 for testing resulted in a proportionally larger test set than using 3/4 for training and 1/4 for testing ( 433k train and 54k test triplets for 3-fold CV vs. 616k train and 23k test triplets for 4-fold CV).
Hardware Specification No The paper does not specify the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies No For most of our evaluation and linear probing experiments, we use Py Torch (Paszke et al., 2019).
Experiment Setup Yes We optimized the transformation matrix W via gradient descent, using Adam (Kingma & Ba, 2015) with a learning rate of η = 0.001. We performed a grid-search over the learning rate η, where η {0.0001, 0.001, 0.01} and found 0.001 to work best for all models in Table B.1. ... To find the optimal strength of the ℓ2 regularization for each linear probe, we performed a grid-search over λ for each k value individually. The optimal λ varied between models, where λ {0.0001, 0.001, 0.01, 0.1, 1}.