RETRACTED: McOmet: Multimodal Fusion Transformer for Physical Audiovisual Commonsense Reasoning

Authors: Daoming Zong, Shiliang Sun

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on a very recent public benchmark, PACS. Results show that MCOMET significantly outperforms a variety of strong baselines, revealing powerful multi-modal commonsense reasoning capabilities. Abundant ablation studies are also conducted to validate the key ingredients of MCOMET .
Researcher Affiliation Academia Daoming Zong and Shiliang Sun* School of Computer Science and Technology, East China Normal University, Shanghai, China ecnuzdm@gmail.com, slsun@cs.ecnu.edu.cn
Pseudocode No The paper describes the model architecture and its components using text and mathematical equations, but it does not include a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper states: 'The models and checkpoints are available at https://huggingface.co/models?other=deberta-v3'. This link refers to the DeBERTa models, which were used as a component, not the authors' own MCOMET source code. There is no statement providing access to the MCOMET code.
Open Datasets Yes Concretely, we use the PACS dataset and benchmark MCOMET on two tasks... PACS (Yu et al. 2022) conceptualizes the datapoints.
Dataset Splits Yes Train/val/test splits consist of 3,460/444/445 datapoints, respectively.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instances).
Software Dependencies No The paper mentions several models and encoders used (e.g., 'Vision Transformer (Vi T)', 'Audio Spectogram Transformer (AST)', 'Temporal Difference Network (TDN)', 'De BERTa V3'), but it does not specify version numbers for these software components or any other libraries/dependencies used in their implementation.
Experiment Setup No While the paper describes the general pipeline and components used in the 'Implementation Details' section, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings for MCOMET.