Catching heuristics are optimal control policies
Authors: Boris Belousov, Gerhard Neumann, Constantin A. Rothkopf, Jan R. Peters
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher s policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control. 4 Simulated experiments and results |
| Researcher Affiliation | Academia | Boris Belousov*, Gerhard Neumann*, Constantin A. Rothkopf**, Jan Peters* *Department of Computer Science, TU Darmstadt **Cognitive Science Center & Department of Psychology, TU Darmstadt |
| Pseudocode | No | The paper describes computational models and optimization methods in text and mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing its source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper describes simulated experiments where they reproduce scenarios and vary parameters. It does not mention using a publicly available dataset for training in the conventional sense of machine learning, nor does it provide concrete access information for one. It does mention fitting parameters to |
| Dataset Splits | No | The paper describes simulated experiments and scenarios but does not discuss train, validation, or test dataset splits, as it does not rely on a traditional dataset in that manner. The experiments are based on a computational model and simulated conditions. |
| Hardware Specification | No | The paper describes the computational model and its parameters but does not specify the hardware (e.g., CPU, GPU models, memory) used to run the simulations or experiments. |
| Software Dependencies | No | The paper mentions using specific software tools: "Derivatives of the cost function are computed using Cas ADi [2]. Non-linear optimization is carried out by Ipopt [26]. L-BFGS and warm-starts used." However, it does not provide version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The paper provides details on the experimental setup, including: "The state of the system x consists of the location and velocity of the ball in 3D space, the location and velocity of the catching agent in the ground plane, and the agent s gaze direction represented by a unit 3D vector. The agent s actions u consist of the force applied to the center of mass and the rate of change of the gaze direction." It also details noise models and cost function components: "The ball is modeled as a parabolic flight perturbed by Gaussian noise with variance σ2 b. The parameters {σmin, σmax} control the scale of the noise... Jfinal = w0 µb µc 2 2 + w1 tr ΣN... The weights w0 and w1 are set to optimally approximate this negated log-probability. The desire of the agent to be energy efficient is encoded as a penalty on the control signals Jenergy = τ P N 1 k=0 u T k Muk with the fixed duration τ of the discretized time steps and a diagonal weight matrix M to trade-off controls... We run the experiment at different noise levels and time delays and average the results over 10 trials. In all cases, the agent starts at the point (20, 5) looking towards the origin, while the ball flies from the origin towards the point (30, 15) in 3 s." |