Correlation Priors for Reinforcement Learning

Authors: Bastian Alt, Adrian Šošić, Heinz Koeppl

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identification and Bayesian reinforcement learning.
Researcher Affiliation Academia Technische Universität Darmstadt {bastian.alt, adrian.sosic, heinz.koeppl}@bcs.tu-darmstadt.de
Pseudocode No The paper provides detailed mathematical derivations and descriptions of the variational inference method, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The accompanying code is publicly available via Git.1 https://git.rwth-aachen.de/bcs/correlation_priors_for_rl
Open Datasets No The paper describes using generated demonstration data sets and data from simulated environments, but does not provide access information or citations for any publicly available or open datasets.
Dataset Splits No The paper describes evaluating models on observed data and simulated environments, but does not provide specific train/validation/test dataset split information.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models or memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For the experiments in the following section, we consider a squared exponential covariance function of the form ( )cc0 = exp [ d(c, c0) 2 /l2], with a covariate distance measure d : C C ! R 0 and a length scale l 2 R 0 adapted to the specific modeling scenario. To capture the underling correlations, we used the Euclidean distance between the grid positions as covariate distance measure d and set l to the maximum occurring distance value. Fig. 2c shows the policy model obtained by averaging the predictive action distributions of M = 100 drawn subgoal configurations.