Harmonizing the object recognition strategies of deep neural networks with humans

Authors: Thomas FEL, Ivan F Rodriguez Rodriguez, Drew Linsley, Thomas Serre

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across 84 different DNNs trained on Image Net and three independent datasets measuring the where and the how of human visual strategies for object recognition on those images, we find a systematic trade-off between DNN categorization accuracy and alignment with human visual strategies for object recognition.
Researcher Affiliation Academia 1Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, RI 2Artificial and Natural Intelligence Toulouse Institute (ANITI), Toulouse, France 3Carney Institute for Brain Science, Brown University, Providence, RI
Pseudocode No The paper describes the neural harmonizer loss function and training process, but it does not include a formal pseudocode block or algorithm steps.
Open Source Code Yes We release our code and data at https://serre-lab.github.io/Harmonization to help the field build more human-like DNNs.
Open Datasets Yes We focused on the Image Net dataset to compare the visual strategies of humans and DNNs for object recognition at scale. We relied on the two significant efforts for gathering feature importance data from humans on Image Net: the Clicktionary [22] and Click Me [20] games, which use slightly different methods to collect their data.
Dataset Splits Yes Our experiments measure the alignment between human and DNN visual strategies using Click Me and Clicktionary feature importance maps captured on the Image Net validation set. As we describe in 4, Click Me feature importance maps from the Image Net training set are used to implement our neural harmonizer.
Hardware Specification Yes Models were trained using 8 cores V4 TPUs on the Google Cloud Platform, and training lasted approximately one day.
Software Dependencies Yes We used pretrained weights for each of these models supplied by their authors, with a variety of licenses (detailed in SI 2), implemented in Tensorflow 2.0, Keras, or Py Torch.
Experiment Setup Yes Models were trained with an augmented Res Net training recipe (built from https://github.com/tensorflow/tpu/). Models were optimized with SGD and momentum over batches of 512 images, a learning rate of 0.3, and label smoothing [90]. Images were augmented with random left-right flips and mixup [91]. The learning rate was adjusted over the course of training with a schedule that began with an initial warm-up period of 5 epochs and then decaying according to a cosine function over 90 epochs, with decay at step 30, 50 and 80.