MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Authors: Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed approaches significantly improve over the competitive baselines. and Table 2: Automatic evaluation results on multimodal script generation and subsequent step prediction tasks. |
| Researcher Affiliation | Academia | Department of Computer Science, Virginia Tech |
| Pseudocode | No | The paper describes the methods in prose and with diagrams (Figure 2), but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes, model checkpoints, and datasets are publicly available at https://github.com/VT-NLP/Multi Script. |
| Open Datasets | Yes | Built from Wiki How, MULTISCRIPT covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. and The codes, model checkpoints, and datasets are publicly available at https://github.com/VT-NLP/Multi Script. |
| Dataset Splits | Yes | We split the instances created for each task into training, development, and test sets. For each task, to ensure the coverage of various domains in each set, we randomly sample 80%, 5%, and 15% articles from each domain, and use the instances created from them as the training, development, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running experiments. |
| Software Dependencies | No | The paper mentions software like Uni VL, OFA model, Katna, Vicuna, and Deberta, and specifies some model checkpoints (OFA-Sys/ofa-base, nli-deberta-v3-base), but does not provide specific version numbers for the underlying software libraries or tools (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | No | The paper describes the overall framework and models used (e.g., Katna, OFA, Uni VL, Deberta, Vicuna) and how they interact, but it does not specify concrete experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer configurations. |