Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
Authors: Minhae Kwon, Saurabh Daptardar, Paul R. Schrater, Xaq Pitkow
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach using a simulated agent for which ground truth is known. Thus, we verify our method by showing the successful recovery of the internal model parameters since we know the ground truth. Figure 4B shows a two-dimensional contour plot of the approximate log-likelihood of observable data L( ). Recall that the model parameters are high dimensional, so here we plot only two dimensions of . The red line shows an example trajectory of parameters as IRC Algorithm 2 converges. Our approach estimates that maximizes the log-likelihood of the observable data L( ). Figure 4C shows that the estimated parameters recovered by our algorithm closely match the agent s true parameters. |
| Researcher Affiliation | Collaboration | Minhae Kwon School of Electronic Engineering Soongsil University Seoul, Republic of Korea minhae@ssu.ac.kr Saurabh Daptardar Google Inc. Mountain View, CA, USA saurabh.dptdr@gmail.com Paul Schrater Department of Computer Science University of Minnesota Minnesota, IN, USA schrater@umn.edu Xaq Pitkow Electrical and Computer Engineering Rice University Houston, TX, USA xaq@rice.edu |
| Pseudocode | Yes | Algorithm 1: Train Bayesian optimal control ensembles; Algorithm 2: Estimate that explains externally observable data the best |
| Open Source Code | No | The paper mentions related projects like 'The animal-AI testbed. http://animalaiolympics.com/AAI/' but does not provide a direct link or explicit statement about the open-sourcing of the code for the methodology described in this paper. |
| Open Datasets | No | The paper describes simulating a task ('catching fireflies') and using 'simulated experiences' but does not refer to a publicly available dataset with concrete access information for training. |
| Dataset Splits | No | The paper describes using 'simulated experiences' and a 'simulated agent' but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) as it operates in a simulated environment rather than on a static, pre-split dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Deep Deterministic Policy Gradient (DDPG)' and 'extended Kalman filter' but does not specify version numbers for these or any other software components or libraries. |
| Experiment Setup | Yes | The hyperparameters used to produce the results are provided in Appendix B, and the relationship between the number of trajectories and the accuracy of the parameter recovery is discussed in Appendix C. |