Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MPMQA: Multimodal Question Answering on Product Manuals
Authors: Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We construct a large-scale dataset PM209 with human annotations to support the research on the MPMQA task. It contains 22,021 QA annotations over 209 product manuals in 27 well-known consumer electronic brands. We conduct experiments to validate our URA model on the proposed PM209 dataset. Table 4 shows the comparison between URA and the baselines described above. |
| Researcher Affiliation | Collaboration | 1School of Information, Renmin University of China 2Samsung Research China Beijing (SRC-B) |
| Pseudocode | No | The paper describes its proposed model and methods in prose and with a block diagram (Figure 7), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA. We release the dataset, code, and model at https://github.com/AIM3-RUC/MPMQA. |
| Open Datasets | Yes | We construct a large-scale dataset PM209 with human annotations to support the research on the MPMQA task. It contains 22,021 QA annotations over 209 product manuals in 27 well-known consumer electronic brands. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA. |
| Dataset Splits | Yes | We divide the manuals in the PM209 dataset into Train/Val/Test as shown in Table 3. Table 3: Number of samples in each data split. |
| Hardware Specification | Yes | It takes about 20 hours to converge on 1 NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | We implement the above-mentioned models based on Pytorch (Paszke et al. 2019) and Huggingface Transformers (Wolf et al. 2020). No specific version numbers are provided for PyTorch or Huggingface Transformers. |
| Experiment Setup | Yes | We train the models for 20 epochs with a batch size of 8 and a learning rate of 3e-5. |