Learning to Perform Physics Experiments via Deep Reinforcement Learning

Authors: Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We found that deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and the cost incurred by the agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations. We also compare our learned experimentation policies to randomized baselines and show that the learned policies lead to better predictions.
Researcher Affiliation Collaboration Misha Denil1 Pulkit Agrawal2 Tejas D Kulkarni1 Tom Erez1 Peter Battaglia1 Nando de Freitas1,3 1Deep Mind 2 University of California Berkeley 3Canadian Institute for Advanced Research
Pseudocode No The paper describes the agent architecture and training procedure verbally but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper describes custom-designed simulated environments ("Which is Heavier" and "Towers") where objects and their properties are randomly generated for each episode, rather than using an existing public dataset or providing access details for a newly created one.
Dataset Splits No The paper describes training and testing phases, but does not explicitly define specific numerical training, validation, and testing dataset splits (e.g., percentages or sample counts). The environments are continuous simulations rather than static datasets with pre-defined splits.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or cloud instance details) used for running the experiments.
Software Dependencies No The paper mentions training agents using "Asynchronous Advantage Actor Critic (Mnih et al., 2016)", which is an algorithm, but it does not list any specific software libraries or their version numbers (e.g., TensorFlow, PyTorch, Python version, etc.).
Experiment Setup Yes For all experiments we train recurrent agents using an LSTM with 100 hidden units. When training from pixels we first scale the observations to 84x84 pixels and feed them through a three convolution layers, each followed by a Re LU non-linearity. The three layers have 32, 64, 64 square filters with sizes 8, 4, 3, which are applied at strides of 4, 2, 1 respectively. We train the agents using Asynchronous Advantage Actor Critic (Mnih et al., 2016), but ensure that the unroll length is always greater than the timeout length so the agent network is unrolled over the entirety of each episode. We set the episode length limit to 100 steps in this environment. We select one of the blocks uniformly at random to be the heavy block and designate the remaining three as light blocks. We sample the mass of the heavy block from Beta(β, 1) and the mass of the light blocks from Beta(1, β). We trained several agents at three different difficulties corresponding to β {3, 5, 10}. We used a discount factor of γ = 0.95 and γ = 0.99. For the direct actuators we use an episode timeout of 26 steps. if the physics simulation time step is 0.025s and control time step is 0.1s, it means that the same action is repeated 4 times.