reproducibilityindex.ai

On the Learning Mechanisms in Physical Reasoning

Authors: Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we take a closer look at this assumption, exploring this fundamental hypothesis by comparing two learning mechanisms: Learning from Dynamics (Lf D) and Learning from Intuition (Lf I). In the first experiment, we directly examine and compare these two mechanisms. Results show a surprising finding: Simple Lf I is better than or on par with state-of-the-art Lf D. Taken together, the results on the challenging benchmark of PHYRE [3] show that Lf I is, if not better, as good as Lf D with bells and whistles for dynamics prediction.
Researcher Affiliation	Academia	1 School of Intelligence Science and Technology, Peking University 2 Institute for Artificial Intelligence, Peking University 3 Department of Automation, Tsinghua University 4 Department of Computer Science, University of California, Los Angeles 5 Beijing Institute for General Artificial Intelligence (BIGAI)
Pseudocode	Yes	Algorithm 1: Parallel optimization of Lf D Variables: and Algorithm 2: Serial optimization of Lf D Variables:
Open Source Code	No	The paper provides a 'Project Website' link (https://lishiqianhugh.github.io/Lf ID_Page) but no direct link to a source code repository or an explicit statement about releasing the code for the described methodology.
Open Datasets	Yes	Environment PHYRE-B is a goal-driven benchmark consisting of 25 different task templates of physical puzzles that can be solved by placing a red ball (hence the B ). Each template has 100 similar tasks, which enables two different evaluation settings. (i) Within-template setting: train on 80% of tasks of each template and test on the rest 20% of each template. (ii) Cross-template setting: train on all tasks of 20 from the 25 templates and test on the rest of five previously unseen templates.
Dataset Splits	No	The paper describes 'Within-template setting: train on 80% of tasks of each template and test on the rest 20% of each template' and 'Cross-template setting: train on all tasks of 20 from the 25 templates and test on the rest of five previously unseen templates', but does not explicitly mention a separate validation dataset split.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. It mentions 'Additional training details are in the supplementary material', implying hardware details might be there, but they are not present in the main text.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., RPIN, ViT, PredRNN, TimeSformer, BEiT, Swin Transformer) but does not provide specific version numbers for any software dependencies required for reproducibility.
Experiment Setup	Yes	In this section, we briefly introduce the challenging physical reasoning benchmark of PHYRE-B and the setup of our experiments. The RPIN model leverages a convolutional interaction network based on object-centric representation and predicts the object states (bounding boxes and masks) into the future. When solving a PHYRE task, the model first predicts 10-time steps into the future for an action and then recruits an MLP-based task-solution model to predict the outcome. We vary the number of time frames supplied to the model: We consider the input of lengths 1, 2, 4, and 8 extracted directly from the simulator with a time interval of 1 second. For parallel optimization, we train an end-to-end framework by integrating the output from Pred RNN with the input of Time Sformer. The model parameters ϕ and θ are updated simultaneously by backpropagating both the dynamics-learning loss and the final cross-entropy loss.