Improving the Cross-Lingual Generalisation in Visual Question Answering

Authors: Farhad Nooralahzadeh, Rico Sennrich

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on x GQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models.
Researcher Affiliation Academia Farhad Nooralahzadeh, Rico Sennrich Department of Computational Linguistics, University of Zurich fahrad.nooralahzadeh@uzh.ch, sennrich@cl.uzh.ch
Pseudocode Yes Algorithm 1: Iterative Magnitude Pruning (IMP) with rewinding step (Han, Mao, and Dally 2016). Input: Model f(.; θ) initialized with pretrained parameters θ0. Parameter: p% : a pruning rate Output: M 1: Set the initial pruning mask to M = 1|θ|. 2: while not done do 3: Train f(.; M θ0) to step t: f(.; M θt). 4: Prune p% of remaining weights of M θt and update M accordingly. 5: end while 6: Return f(.; M θ0).
Open Source Code Yes Code and data to reproduce our findings are publicly available.1 1https://github.com/nooralahzadeh/CLG-VQA
Open Datasets Yes Recently, Pfeiffer et al. (2022) introduce a typologically diverse multilingual and multimodal benchmark for VQA task by extending the monolingual English-only GQA (Hudson and Manning 2019) dataset. [...] We adopt the codebase of IGLUE benchmark4 to implement our proposed approach and we keep the value of the models and training hyperparameters equal to the ones that are reported by Bugliarello et al. (2022). 4https://iglue-benchmark.github.io/
Dataset Splits Yes We set hyper-parameters d1 = 0.8, d2 = 0.8, k = 10 and α = 10 based on validation set performance.6 They utilize 12,578 questions and 398 images from the test and development set of GQA, where the questions are manually translated into 7 different languages, covering 5 different scripts: Bengali (Bn), German (De), Indonesian (Id), Korean (Ko), Portuguese (Pt), Russian (Ru) and simplified Chinese (Zh).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies No To compute the cosine distance among 1842 labels in x GQA, we use the spaCy5 toolkit, where an embedding emby R300 of each label is derived from GloVe (Pennington, Socher, and Manning 2014) pretrained word embeddings. ... Using PyTorch pruning module7, we extract the subnetwork from the pretained weights θ0 following Algorithm 1 and Step0 of SFT strategy.
Experiment Setup Yes We set hyper-parameters d1 = 0.8, d2 = 0.8, k = 10 and α = 10 based on validation set performance.6 More specifically, we perform IMP and prune a set of weights with the lowest-magnitude globally throughout the network after each fine-tuning epoch (number of epochs=5).