Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

Authors: Xiangming Zhu, Huayu Deng, Haochen Yuan, Yunbo Wang, Xiaokang Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation. Our model demonstrates strong performance in all three tasks. and 5 EXPERIMENTS
Researcher Affiliation Academia Xiangming Zhu Huayu Deng Haochen Yuan Yunbo Wang Xiaokang Yang Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {xmzhu76, deng_hy99, yuanhaochen, yunbow, xkyang}@sjtu.edu.cn
Pseudocode Yes Algorithm 1: Learning procedures in the visual scene
Open Source Code No The paper provides a project website link (https://sites.google.com/view/latent-intuitive-physics/) which states 'Code will be released soon,' indicating the code is not yet publicly available.
Open Datasets No The paper describes how its datasets were generated using tools like DFSPH (Bender & Koschier, 2015) and Blender (Community, 2018), and provides a link to the SPlis HSPlas H framework (https://github.com/Interactive Computer Graphics/SPlis HSPlas H) used for simulation. However, it does not provide direct access (link, DOI, or specific repository) to the generated datasets used for training.
Dataset Splits No Section C.1 states: 'The particle dataset contains 600 scenes, with 540 scenes used for training and 60 scenes reserved for the test set.' A dedicated validation split is not explicitly mentioned with percentages or counts.
Hardware Specification Yes The experiments are conducted on 4 NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions the ADAM optimizer (Kingma & Ba, 2015) but does not provide specific version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train our model and the baselines with multi-view observations on the fluid sequence of the Cuboid geometry. Before Stage B, The Phys Ne RF is finetuned on visual observations for 100k steps with learning rate 3e 4 and exponential learning rate decay γ = 0.1 given the estimated initial state and multi-view observation of the first frame. After that, we freeze Rϕ and Tθ (pretrained on Particle Dataset) and infer the visual posterior by backpropagating the rendering loss. Then the physical prior learner pψ is trained to adapt to the inferred visual posterior. The visual posterior latent and physical prior learner are separately optimized for 100k steps and 50k steps in Stage B and Stage C, with a learning rate of 1e 4 and a cosine annealing scheduler. and The ADAM optimizer (Kingma & Ba, 2015) is used with an initial learning rate of 0.001 and a batch size of 16 for 50k iterations. We follow previous works (Ummenhofer et al., 2020; Prantl et al., 2022) to set a scheduled learning rate decay where the learning rate is halved every 5k iterations, beginning at iteration 25k. The latent distribution of each particle is an 8-dimensional Gaussian with parameterized mean and standard deviation. The KL regularizer β is set as 0.1, shown in Table 7.