reproducibilityindex.ai

MEWL: Few-shot multimodal word learning with referential uncertainty

Authors: Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we deploy MEWL to analyze machines and humans ability to perform few-shot word learning under the nine scenarios. We first benchmark machines on MEWL by analyzing multimodal (i.e., pre-trained vision-language) and unimodal models (i.e., Large Language Models (LLMs)). Our experimental results indicate that pre-trained vision-language models struggle to learn word meaning with only a few examples, lagging far behind what humans can do.
Researcher Affiliation	Academia	1Institute for AI, Peking University 2Yuanpei College, Peking University 3National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence 4School of Computer Science & Technology, Beijing Institute of Technology 5School of EECS, Peking University 6Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 7School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University.
Pseudocode	No	The paper describes the methods and procedures used, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data: https://github.com/jianggy/MEWL.
Open Datasets	Yes	Code and data: https://github.com/jianggy/MEWL. ... MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing.
Dataset Splits	Yes	MEWL includes 27,000 problems for training, 5,400 problems for validation, and 5,400 problems for testing.
Hardware Specification	Yes	All experiments run on eight NVIDIA A100 80GB GPUs.
Software Dependencies	No	The paper mentions software like CLIP, OPT, GPT-3.5 (text-davinci-003), and BERT, but it does not specify version numbers for general software dependencies (e.g., programming languages, deep learning frameworks like PyTorch or TensorFlow, or other libraries).
Experiment Setup	Yes	For CLIP: 'The model is trained on the training set for 600 epochs, dropout 0.1, batch size 64, learning rate 1 10 4, and Adam W optimizer (weight decay 0.01).' For Aloe: 'training 200 epochs, learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 128.' For Flamingo: 'training steps 30000 ( 106 epochs), learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 96.' For BERT: 'We fine-tune a BERT-base model on the training set for 200 epochs, with learning rate 5 10 5, Adam optimizer (β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, linear learning rate decay, and batch size 64.'