Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
Authors: Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation, we consider not only the accuracy of the physical predictions of the model, but also its utility for downstream tasks that require an actionable representation of intuitive physics. After training our model on an image prediction task, we can use its learned representations to build block towers more complicated than those observed during training. |
| Researcher Affiliation | Academia | University of California, Berkeley Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Planning Procedure |
| Open Source Code | No | The paper refers to 'people.eecs.berkeley.edu/ janner/o2p2 for videos of the evaluation' but does not explicitly state that the source code for the methodology is provided. |
| Open Datasets | No | The paper states, 'In total, we collected 60,000 training images using the Mu Jo Co simulator.' and does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions 'training images' and 'held-out random configurations' but does not provide specific percentages, counts, or predefined splits for training, validation, and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or specific computing cluster specifications) used for running experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for software dependencies such as libraries, frameworks, or the Mu Jo Co simulator itself. |
| Experiment Setup | Yes | Objects were represented as 256-dimensional vectors. The perception module had four convolutional layers of {32, 64, 128, 256} channels, a kernel size of 4, and a stride of 2 followed by a single fully-connected layer with output size matching the object representation dimension. Both MLPs in the physics engine had two hidden layers each of size 512. The rendering networks had convolutional layers with {128, 64, 32, 3} channels (or 1 output channel in the case of the heatmap predictor), kernel sizes of {5, 5, 6, 6}, and strides of 2. We used the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e-3. In practice, we used CEM starting from a uniform distribution with five iterations, 1000 samples per iteration, and used the top 10% of samples to fit the subsequent iteration s sampling distribution. |