Learning Long-term Visual Dynamics with Region Proposal Interaction Networks
Authors: Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we thoroughly evaluate our approach across four datasets to study scientific questions related to a) prediction quality, b) generalization to time horizons longer than training, c) generalization to unseen configurations, d) planning ability for downstream tasks. |
| Researcher Affiliation | Academia | Haozhi Qi UC Berkeley Xiaolong Wang UC San Diego Deepak Pathak CMU Yi Ma UC Berkeley Jitendra Malik UC Berkeley |
| Pseudocode | Yes | Algorithm 1: Planning Algorithm for Simulated Billiard and PHYRE |
| Open Source Code | Yes | Code, pre-trained models, and more visualization results are available at our Website. |
| Open Datasets | Yes | PHYRE: We use the BALL-tier of the PHYRE benchmark (Bakhtin et al., 2019).; Shape Stacks (SS): This dataset contains multiple stacked objects (cubes, cylinders, or balls) (Ye et al., 2019). |
| Dataset Splits | Yes | The benchmark provides two evaluation settings: 1) within task generalization (PHYRE-W), where the testing environments contain the same object category but different sizes and positions; 2) cross task generalization (PHYRE-C)... We report prediction using the official fold 0 and the physical reasoning performance averaged on 10 folds. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer Kingma & Ba (2014) with cosine decay Loshchilov & Hutter (2016)" but does not specify software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python version). |
| Experiment Setup | Yes | The default input frames is N = 4 except N = 1 for Shape Stacks and PHYRE. We set d to be 256 except for simulation billiard d is 64. During training, T (denoted as Ttrain) is set to be 20 for Sim B and Real B, 5 for PHYRE, and 15 for fair comparison with Ye et al. (2019). The discounted factor λt is set to be ( current_iter / max_iter )t. |