MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

Authors: Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed approaches significantly improve over the competitive baselines. and Table 2: Automatic evaluation results on multimodal script generation and subsequent step prediction tasks.
Researcher Affiliation Academia Department of Computer Science, Virginia Tech
Pseudocode No The paper describes the methods in prose and with diagrams (Figure 2), but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The codes, model checkpoints, and datasets are publicly available at https://github.com/VT-NLP/Multi Script.
Open Datasets Yes Built from Wiki How, MULTISCRIPT covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. and The codes, model checkpoints, and datasets are publicly available at https://github.com/VT-NLP/Multi Script.
Dataset Splits Yes We split the instances created for each task into training, development, and test sets. For each task, to ensure the coverage of various domains in each set, we randomly sample 80%, 5%, and 15% articles from each domain, and use the instances created from them as the training, development, and test sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies No The paper mentions software like Uni VL, OFA model, Katna, Vicuna, and Deberta, and specifies some model checkpoints (OFA-Sys/ofa-base, nli-deberta-v3-base), but does not provide specific version numbers for the underlying software libraries or tools (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup No The paper describes the overall framework and models used (e.g., Katna, OFA, Uni VL, Deberta, Vicuna) and how they interact, but it does not specify concrete experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer configurations.