Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

Authors: Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we thoroughly evaluate our approach across four datasets to study scientific questions related to a) prediction quality, b) generalization to time horizons longer than training, c) generalization to unseen configurations, d) planning ability for downstream tasks.
Researcher Affiliation Academia Haozhi Qi UC Berkeley Xiaolong Wang UC San Diego Deepak Pathak CMU Yi Ma UC Berkeley Jitendra Malik UC Berkeley
Pseudocode Yes Algorithm 1: Planning Algorithm for Simulated Billiard and PHYRE
Open Source Code Yes Code, pre-trained models, and more visualization results are available at our Website.
Open Datasets Yes PHYRE: We use the BALL-tier of the PHYRE benchmark (Bakhtin et al., 2019).; Shape Stacks (SS): This dataset contains multiple stacked objects (cubes, cylinders, or balls) (Ye et al., 2019).
Dataset Splits Yes The benchmark provides two evaluation settings: 1) within task generalization (PHYRE-W), where the testing environments contain the same object category but different sizes and positions; 2) cross task generalization (PHYRE-C)... We report prediction using the official fold 0 and the physical reasoning performance averaged on 10 folds.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using "Adam optimizer Kingma & Ba (2014) with cosine decay Loshchilov & Hutter (2016)" but does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python version).
Experiment Setup Yes The default input frames is N = 4 except N = 1 for Shape Stacks and PHYRE. We set d to be 256 except for simulation billiard d is 64. During training, T (denoted as Ttrain) is set to be 20 for Sim B and Real B, 5 for PHYRE, and 15 for fair comparison with Ye et al. (2019). The discounted factor λt is set to be ( current_iter / max_iter )t.