Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Authors: Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Josh Tenenbaum, Chuang Gan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | More accurate dynamics prediction in learned physics models enables state-of-the-art performance on both synthetic and real-world benchmarks while still maintaining high transparency and interpretability; most notably, VRDP improves the accuracy of predictive and counterfactual questions by 4.5% and 11.5% compared to its best counterpart. |
| Researcher Affiliation | Collaboration | Mingyu Ding MIT CSAIL and HKU Zhenfang Chen MIT-IBM Watson AI Lab Tao Du MIT CSAIL Ping Luo HKU Joshua B. Tenenbaum MIT BCS, CBMM, CSAIL Chuang Gan MIT-IBM Watson AI Lab |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. The model details and training objectives are described in narrative text and mathematical equations. |
| Open Source Code | Yes | Project page: http://vrdp.csail.mit.edu/ |
| Open Datasets | Yes | To validate the effectiveness of our method for reasoning about the physical world, we conduct main experiments on the CLEVRER [79] dataset, as it contains both language and physics cues such as rigid body collisions and dynamics... For real-world scenarios, we conduct experiments on the Real-Billiard [63] dataset, which contains three-cushion billiards videos captured in real games for dynamics prediction. |
| Dataset Splits | Yes | We collect a few-shot physical reasoning dataset with novel language and physical concepts (e.g., heavier and lighter ), termed generalized CLEVRER, containing 100 videos (split into 25/25/50 for train/validation/test) with 375 options in 158 counterfactual questions. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models, memory, or detailed computer specifications) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using a 'pre-trained Faster R-CNN model [31]' and 'Slot-Attention model [57]' and refers to an 'impulse-based differentiable rigid-body simulator [37, 62, 14]', but does not provide specific version numbers for these software components or any other ancillary software dependencies. |
| Experiment Setup | Yes | We set t = 0.004s, K = 10, S = 10, and T = 128 for CLEVRER [79] and T = 20 for Real-Billiard [63]. More details of the dataset and settings can be found in Supplemental Materials. |