MPMQA: Multimodal Question Answering on Product Manuals

Authors: Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We construct a large-scale dataset PM209 with human annotations to support the research on the MPMQA task. It contains 22,021 QA annotations over 209 product manuals in 27 well-known consumer electronic brands. We conduct experiments to validate our URA model on the proposed PM209 dataset. Table 4 shows the comparison between URA and the baselines described above.
Researcher Affiliation Collaboration 1School of Information, Renmin University of China 2Samsung Research China Beijing (SRC-B)
Pseudocode No The paper describes its proposed model and methods in prose and with a block diagram (Figure 7), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA. We release the dataset, code, and model at https://github.com/AIM3-RUC/MPMQA.
Open Datasets Yes We construct a large-scale dataset PM209 with human annotations to support the research on the MPMQA task. It contains 22,021 QA annotations over 209 product manuals in 27 well-known consumer electronic brands. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.
Dataset Splits Yes We divide the manuals in the PM209 dataset into Train/Val/Test as shown in Table 3. Table 3: Number of samples in each data split.
Hardware Specification Yes It takes about 20 hours to converge on 1 NVIDIA RTX A6000 GPU.
Software Dependencies No We implement the above-mentioned models based on Pytorch (Paszke et al. 2019) and Huggingface Transformers (Wolf et al. 2020). No specific version numbers are provided for PyTorch or Huggingface Transformers.
Experiment Setup Yes We train the models for 20 epochs with a batch size of 8 and a learning rate of 3e-5.