MEWL: Few-shot multimodal word learning with referential uncertainty
Authors: Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we deploy MEWL to analyze machines and humans ability to perform few-shot word learning under the nine scenarios. We first benchmark machines on MEWL by analyzing multimodal (i.e., pre-trained vision-language) and unimodal models (i.e., Large Language Models (LLMs)). Our experimental results indicate that pre-trained vision-language models struggle to learn word meaning with only a few examples, lagging far behind what humans can do. |
| Researcher Affiliation | Academia | 1Institute for AI, Peking University 2Yuanpei College, Peking University 3National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence 4School of Computer Science & Technology, Beijing Institute of Technology 5School of EECS, Peking University 6Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 7School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University. |
| Pseudocode | No | The paper describes the methods and procedures used, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data: https://github.com/jianggy/MEWL. |
| Open Datasets | Yes | Code and data: https://github.com/jianggy/MEWL. ... MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing. |
| Dataset Splits | Yes | MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing. |
| Hardware Specification | Yes | All experiments run on eight NVIDIA A100 80GB GPUs. |
| Software Dependencies | No | The paper mentions software like CLIP, OPT, GPT-3.5 (text-davinci-003), and BERT, but it does not specify version numbers for general software dependencies (e.g., programming languages, deep learning frameworks like PyTorch or TensorFlow, or other libraries). |
| Experiment Setup | Yes | For CLIP: 'The model is trained on the training set for 600 epochs, dropout 0.1, batch size 64, learning rate 1 10 4, and Adam W optimizer (weight decay 0.01).' For Aloe: 'training 200 epochs, learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 128.' For Flamingo: 'training steps 30000 ( 106 epochs), learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 96.' For BERT: 'We fine-tune a BERT-base model on the training set for 200 epochs, with learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 64.' |