I-PHYRE: Interactive Physical Reasoning

Authors: Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our exploration involves three planning strategies and examines several supervised and reinforcement agents zero-shot generalization proficiency on I-PHYRE. The outcomes highlight a notable gap between existing learning algorithms and human performance, emphasizing the imperative for more research in equipping agents with interactive physical reasoning capabilities.
Researcher Affiliation Academia Shiqian Li1,2, Kewen Wu2,3, Chi Zhang2 , Yixin Zhu1 1 Institute for Artificial Intelligence, Peking University 2 National Key Laboratory of General Artificial Intelligence, BIGAI 3 Department of Automation, Tsinghua University
Pseudocode No The paper describes methods in prose and with figures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper provides a project website link (https://lishiqianhugh.github.io/IPHYRE/) but does not explicitly state that the source code for the methodology is available there, nor is it a direct link to a code repository.
Open Datasets No The paper states that I-PHYRE includes 40 distinctive games and defines splits ('basic split' for training), which implies a dataset creation, but it does not provide concrete access information (e.g., URL, DOI, or a citation to a public dataset release) for these games/dataset.
Dataset Splits Yes We have formulated 40 distinct games and segregated them into four splits, including a basic split for training and three additional generalization splits. These are intended to assess the authentic interactive physical understanding of agents beyond mere data fitting. Specifically, the generalization splits are designed to test the agents ability to (i) discern key physical elements amidst noise, (ii) strategize for long sequences through compositionality, and (iii) conform to more stringent timing constraints. Agents are trained on the basic split and evaluated on the remaining generalization splits.
Hardware Specification Yes We run all our experiments on RTX 3090 GPUs.
Software Dependencies No The paper mentions software used ('implemented using pymunk, rendered in pygame... and integrated into Gym') but does not specify version numbers for these components, which is required for reproducibility.
Experiment Setup Yes The policy architecture of model-free learners is MLP with two hidden layers of size 256 and the activation function is tanh. A2C-I, A2C-C, and SAC-C are trained with a learning rate of 1 ˆ 10 5. All other models are trained with a learning rate of 1 ˆ 10 6. SAC-I is trained for 57k steps. SAC-O and SAC-C are trained for 80k steps. A2C-I are trained for 426k steps. Other models are trained for 800k steps.