ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

Authors: Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on Cont Phy, which shows that current AI models still lack physical commonsense for the continuum, especially soft-bodies, and illustrates the value of the proposed dataset. We also evaluate a series of traditional AI models (Hudson & Manning, 2018; Li et al., 2022a; Le et al., 2020) and recent multimodal large language models (Team et al., 2023; Achiam et al., 2023) on Cont Phy.
Researcher Affiliation Collaboration 1Tsinghua University 2Wuhan University 3MIT-IBM Watson AI Lab 4Massachusetts Institute of Technology 5UMass Amherst.
Pseudocode No The paper includes Python-style code snippets in the appendix (Listings 3-7) as 'Qualitative Examples' and 'Full API', but these are not explicitly labeled or referred to as 'pseudocode' or an 'algorithm block'.
Open Source Code Yes Project page: https: //physical-reasoning-project. github.io.
Open Datasets Yes We introduce the Continuum Physical Dataset (Cont Phy), a novel benchmark for assessing machine physical commonsense. Cont Phy aims to spur progress in perception and reasoning within diverse physical settings, narrowing the divide between human and machine intelligence in understanding the physical world. Project page: https: //physical-reasoning-project. github.io.
Dataset Splits Yes We divided the dataset into three subsets: 50% for training, 20% for validation, and 30% for testing.
Hardware Specification No The paper mentions 'computation support from Ai MOS, a server cluster' and 'batch size is 16 for 8 GPUs' implying the use of GPUs, but it does not specify any exact GPU or CPU models, memory details, or specific cluster configurations.
Software Dependencies No The paper mentions software like 'Unity engine', 'Mask R-CNN', 'Detectron2', 'DPI-Net', 'Material Point Method (MPM)', 'Chat GPT (Ouyang et al., 2022)', 'GPT-4 (gpt-4-0125-preview)', and 'CUDA'. However, for most of these, specific version numbers are not provided, which is necessary for reproducibility.
Experiment Setup Yes For the Visual Perception Module... the batch size is 16 for 8 GPUs... train the model for 50k iterations, with a learning rate of 0.02. ... For the fluid scenario... simulation time step is 1/3000... Initial physical properties include κ = 1 × 10^3, default viscosity μ = 0.01, and default density ρ = 1000. Learning rates for viscosity μ and density ρ are 0.001 and 0.1 respectively. ... For the ball scenario... simulation time step is 1/6000/32... default Young’s modulus E = 0.1, default Poisson’s ratio ν = 0.1, and default yield stress 3 × 10^-2. ... For the rope scenario... trained the model for 50k iterations with a batch size of 1 and a learning rate of 0.0001.