reproducibilityindex.ai

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

Authors: Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon7886-7894

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and empirical evidence highlighting the beneﬁts of our approach in the model misspeciﬁcation regime compared to likelihood-based methods. and This section aims to test the following hypotheses: The OMD agent with approximations from Section 6 achieves near-optimal returns. The performance of OMD is better compared to MLE under the model misspeciﬁcation. Parameters θ of the OMD model have low likelihood, yet an agent acting with the Q-function trained with the model achieves near-optimal returns in the true MDP. and 7 Experiments with Function Approximation
Researcher Affiliation	Collaboration	1Mila, Universit e de Montr eal, 2Vector Institute, University of Toronto 3Google Research, Brain Team, 4Facebook CIFAR AI Chair
Pseudocode	Yes	Algorithm 1: Model Based RL with OMD
Open Source Code	No	The paper links to a third-party implementation of Soft Actor-Critic (https://github.com/ikostrikov/jaxrl) which was used as an inner optimizer, but it does not provide concrete access to its own source code for the OMD methodology.
Open Datasets	Yes	Setup. We ﬁrst use Cart Pole (Barto, Sutton, and Anderson 1983) and later include results on Mu Jo Co Half Cheetah (Todorov, Erez, and Tassa 2012) with similar ﬁndings further supporting our conclusions.
Dataset Splits	No	The paper mentions using Cart Pole and Half Cheetah environments but does not provide specific details on the training, validation, and test dataset splits used for reproduction, such as exact percentages or sample counts.
Hardware Specification	No	The paper mentions 'Compute Canada for computational resources' but does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions JAX and refers to a JAX-based implementation of Soft Actor-Critic, but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiment.
Experiment Setup	No	The paper mentions some general settings like a soft Bellman operator with temperature α = 0.01 and using K steps for optimization, but it does not provide a comprehensive list of specific experimental setup details, such as learning rates, batch sizes, exact optimizer settings, or other hyperparameters required for full reproducibility.