I-PHYRE: Interactive Physical Reasoning
Authors: Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our exploration involves three planning strategies and examines several supervised and reinforcement agents zero-shot generalization proficiency on I-PHYRE. The outcomes highlight a notable gap between existing learning algorithms and human performance, emphasizing the imperative for more research in equipping agents with interactive physical reasoning capabilities. |
| Researcher Affiliation | Academia | Shiqian Li1,2, Kewen Wu2,3, Chi Zhang2 , Yixin Zhu1 1 Institute for Artificial Intelligence, Peking University 2 National Key Laboratory of General Artificial Intelligence, BIGAI 3 Department of Automation, Tsinghua University |
| Pseudocode | No | The paper describes methods in prose and with figures but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project website link (https://lishiqianhugh.github.io/IPHYRE/) but does not explicitly state that the source code for the methodology is available there, nor is it a direct link to a code repository. |
| Open Datasets | No | The paper states that I-PHYRE includes 40 distinctive games and defines splits ('basic split' for training), which implies a dataset creation, but it does not provide concrete access information (e.g., URL, DOI, or a citation to a public dataset release) for these games/dataset. |
| Dataset Splits | Yes | We have formulated 40 distinct games and segregated them into four splits, including a basic split for training and three additional generalization splits. These are intended to assess the authentic interactive physical understanding of agents beyond mere data fitting. Specifically, the generalization splits are designed to test the agents ability to (i) discern key physical elements amidst noise, (ii) strategize for long sequences through compositionality, and (iii) conform to more stringent timing constraints. Agents are trained on the basic split and evaluated on the remaining generalization splits. |
| Hardware Specification | Yes | We run all our experiments on RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software used ('implemented using pymunk, rendered in pygame... and integrated into Gym') but does not specify version numbers for these components, which is required for reproducibility. |
| Experiment Setup | Yes | The policy architecture of model-free learners is MLP with two hidden layers of size 256 and the activation function is tanh. A2C-I, A2C-C, and SAC-C are trained with a learning rate of 1 ˆ 10 5. All other models are trained with a learning rate of 1 ˆ 10 6. SAC-I is trained for 57k steps. SAC-O and SAC-C are trained for 80k steps. A2C-I are trained for 426k steps. Other models are trained for 800k steps. |