reproducibilityindex.ai

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Authors: Sihan Chen, Handong Li, Qunbo Wang, Zijia Zhao, Mingzhen Sun, Xinxin Zhu, Jing Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks.
Researcher Affiliation	Collaboration	Sihan Chen12 , Handong Li12 , Qunbo Wang2, Zijia Zhao21, Mingzhen Sun21, Xinxin Zhu2, Jing Liu12 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2 Institute of Automation, Chinese Academy of Science
Pseudocode	No	The paper provides diagrams and textual descriptions of the methods but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	Code, model and dataset will be released at https://github.com/TXH-mercury/VAST.
Open Datasets	Yes	Code, model and dataset will be released at https://github.com/TXH-mercury/VAST. The training is conducted on a combination corpus consisting of VAST27M, VALOR-1M, Wav Caps, CC14M, and 110M randomly sampled pairs from LAION-400M
Dataset Splits	Yes	Specific train/val/test splits of those benchmarks can be found in Table 9
Hardware Specification	Yes	VAST is trained using the Py Torch framework on 64 Tesla V100 cards.
Software Dependencies	No	The paper mentions 'Py Torch framework' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	The training is conducted... for a total of 200K training steps. At each training step, one corpus is sampled for training. ... The initial learning rate is set to 1e-4, and a linear decay schedule is used. The batch size is set to 1024. Specific finetuning hyperparameters of VAST for different benchmarks are presented in Table 10.