RETRACTED: McOmet: Multimodal Fusion Transformer for Physical Audiovisual Commonsense Reasoning
Authors: Daoming Zong, Shiliang Sun
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on a very recent public benchmark, PACS. Results show that MCOMET significantly outperforms a variety of strong baselines, revealing powerful multi-modal commonsense reasoning capabilities. Abundant ablation studies are also conducted to validate the key ingredients of MCOMET . |
| Researcher Affiliation | Academia | Daoming Zong and Shiliang Sun* School of Computer Science and Technology, East China Normal University, Shanghai, China ecnuzdm@gmail.com, slsun@cs.ecnu.edu.cn |
| Pseudocode | No | The paper describes the model architecture and its components using text and mathematical equations, but it does not include a distinct block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper states: 'The models and checkpoints are available at https://huggingface.co/models?other=deberta-v3'. This link refers to the DeBERTa models, which were used as a component, not the authors' own MCOMET source code. There is no statement providing access to the MCOMET code. |
| Open Datasets | Yes | Concretely, we use the PACS dataset and benchmark MCOMET on two tasks... PACS (Yu et al. 2022) conceptualizes the datapoints. |
| Dataset Splits | Yes | Train/val/test splits consist of 3,460/444/445 datapoints, respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instances). |
| Software Dependencies | No | The paper mentions several models and encoders used (e.g., 'Vision Transformer (Vi T)', 'Audio Spectogram Transformer (AST)', 'Temporal Difference Network (TDN)', 'De BERTa V3'), but it does not specify version numbers for these software components or any other libraries/dependencies used in their implementation. |
| Experiment Setup | No | While the paper describes the general pipeline and components used in the 'Implementation Details' section, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings for MCOMET. |