MEWL: Few-shot multimodal word learning with referential uncertainty

Authors: Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we deploy MEWL to analyze machines and humans ability to perform few-shot word learning under the nine scenarios. We first benchmark machines on MEWL by analyzing multimodal (i.e., pre-trained vision-language) and unimodal models (i.e., Large Language Models (LLMs)). Our experimental results indicate that pre-trained vision-language models struggle to learn word meaning with only a few examples, lagging far behind what humans can do.
Researcher Affiliation Academia 1Institute for AI, Peking University 2Yuanpei College, Peking University 3National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence 4School of Computer Science & Technology, Beijing Institute of Technology 5School of EECS, Peking University 6Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 7School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University.
Pseudocode No The paper describes the methods and procedures used, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and data: https://github.com/jianggy/MEWL.
Open Datasets Yes Code and data: https://github.com/jianggy/MEWL. ... MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing.
Dataset Splits Yes MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing.
Hardware Specification Yes All experiments run on eight NVIDIA A100 80GB GPUs.
Software Dependencies No The paper mentions software like CLIP, OPT, GPT-3.5 (text-davinci-003), and BERT, but it does not specify version numbers for general software dependencies (e.g., programming languages, deep learning frameworks like PyTorch or TensorFlow, or other libraries).
Experiment Setup Yes For CLIP: 'The model is trained on the training set for 600 epochs, dropout 0.1, batch size 64, learning rate 1 10 4, and Adam W optimizer (weight decay 0.01).' For Aloe: 'training 200 epochs, learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 128.' For Flamingo: 'training steps 30000 ( 106 epochs), learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 96.' For BERT: 'We fine-tune a BERT-base model on the training set for 200 epochs, with learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 64.'