reproducibilityindex.ai

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

Authors: Minhae Kwon, Saurabh Daptardar, Paul R. Schrater, Xaq Pitkow

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our approach using a simulated agent for which ground truth is known. Thus, we verify our method by showing the successful recovery of the internal model parameters since we know the ground truth. Figure 4B shows a two-dimensional contour plot of the approximate log-likelihood of observable data L( ). Recall that the model parameters are high dimensional, so here we plot only two dimensions of . The red line shows an example trajectory of parameters as IRC Algorithm 2 converges. Our approach estimates that maximizes the log-likelihood of the observable data L( ). Figure 4C shows that the estimated parameters recovered by our algorithm closely match the agent s true parameters.
Researcher Affiliation	Collaboration	Minhae Kwon School of Electronic Engineering Soongsil University Seoul, Republic of Korea minhae@ssu.ac.kr Saurabh Daptardar Google Inc. Mountain View, CA, USA saurabh.dptdr@gmail.com Paul Schrater Department of Computer Science University of Minnesota Minnesota, IN, USA schrater@umn.edu Xaq Pitkow Electrical and Computer Engineering Rice University Houston, TX, USA xaq@rice.edu
Pseudocode	Yes	Algorithm 1: Train Bayesian optimal control ensembles; Algorithm 2: Estimate that explains externally observable data the best
Open Source Code	No	The paper mentions related projects like 'The animal-AI testbed. http://animalaiolympics.com/AAI/' but does not provide a direct link or explicit statement about the open-sourcing of the code for the methodology described in this paper.
Open Datasets	No	The paper describes simulating a task ('catching ﬁreﬂies') and using 'simulated experiences' but does not refer to a publicly available dataset with concrete access information for training.
Dataset Splits	No	The paper describes using 'simulated experiences' and a 'simulated agent' but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) as it operates in a simulated environment rather than on a static, pre-split dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Deep Deterministic Policy Gradient (DDPG)' and 'extended Kalman ﬁlter' but does not specify version numbers for these or any other software components or libraries.
Experiment Setup	Yes	The hyperparameters used to produce the results are provided in Appendix B, and the relationship between the number of trajectories and the accuracy of the parameter recovery is discussed in Appendix C.