reproducibilityindex.ai

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning

Authors: Changsheng Lv, Shuai Zhang, Yapeng Tian, Mengshi Qi, Huadong Ma

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance.
Researcher Affiliation	Academia	Changsheng Lv1,2 , Shuai Zhang1,2 , Yapeng Tian3, Mengshi Qi1,2B , and Huadong Ma1,2 1Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia 2Beijing University of Posts and Telecommunications 3Department of Computer Science, The University of Texas at Dallas {lvchangsheng, zshuai, qms, mhd}@bupt.edu.cn, yapeng.tian@utdallas.edu
Pseudocode	No	The paper describes the proposed approach in text and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/Andy20178/DCL.
Open Datasets	Yes	The Physical Audiovisual Common Sense Reasoning Dataset (PACS) [2] is a collection of 13,400 question-answer pairs designed for testing physical commonsense reasoning capabilities.
Dataset Splits	Yes	Following [2], We divide PACS into 11,044/1,192/1,164 as train/val/test set, each of which contains 1,224/150/152 objects respectively. We partitioned the PACS-Material subset into 3,460/444/445 for train/val/test under the same object distribution as PACS.
Hardware Specification	Yes	We implement our proposed model with Py Torch on two NVIDIA RTX 3090 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version number or other software dependencies with versions.
Experiment Setup	Yes	Specifically, we downsampled each video to T = 8 frames during pre-processing and set the feature dimension as d = 256. In the Disentangled Sequence Encoder, we used a hidden layer size of 256 for Bi-LSTM. During optimization, we set the batch size as 64, which consisted of 64 video pairs and the corresponding questions. In the Counterfactual Learning Module, τ = 2 and k = 5 were used when calculating similarities and constructing the physical knowledge relationships.