reproducibilityindex.ai

Correlation Priors for Reinforcement Learning

Authors: Bastian Alt, Adrian Šošić, Heinz Koeppl

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identiﬁcation and Bayesian reinforcement learning.
Researcher Affiliation	Academia	Technische Universität Darmstadt {bastian.alt, adrian.sosic, heinz.koeppl}@bcs.tu-darmstadt.de
Pseudocode	No	The paper provides detailed mathematical derivations and descriptions of the variational inference method, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The accompanying code is publicly available via Git.1 https://git.rwth-aachen.de/bcs/correlation_priors_for_rl
Open Datasets	No	The paper describes using generated demonstration data sets and data from simulated environments, but does not provide access information or citations for any publicly available or open datasets.
Dataset Splits	No	The paper describes evaluating models on observed data and simulated environments, but does not provide specific train/validation/test dataset split information.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	For the experiments in the following section, we consider a squared exponential covariance function of the form ( )cc0 = exp [ d(c, c0) 2 /l2], with a covariate distance measure d : C C ! R 0 and a length scale l 2 R 0 adapted to the speciﬁc modeling scenario. To capture the underling correlations, we used the Euclidean distance between the grid positions as covariate distance measure d and set l to the maximum occurring distance value. Fig. 2c shows the policy model obtained by averaging the predictive action distributions of M = 100 drawn subgoal conﬁgurations.