Correlation Priors for Reinforcement Learning
Authors: Bastian Alt, Adrian Šošić, Heinz Koeppl
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the framework on a number of common decision-making related problems, such as imitation learning, subgoal extraction, system identification and Bayesian reinforcement learning. |
| Researcher Affiliation | Academia | Technische Universität Darmstadt {bastian.alt, adrian.sosic, heinz.koeppl}@bcs.tu-darmstadt.de |
| Pseudocode | No | The paper provides detailed mathematical derivations and descriptions of the variational inference method, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The accompanying code is publicly available via Git.1 https://git.rwth-aachen.de/bcs/correlation_priors_for_rl |
| Open Datasets | No | The paper describes using generated demonstration data sets and data from simulated environments, but does not provide access information or citations for any publicly available or open datasets. |
| Dataset Splits | No | The paper describes evaluating models on observed data and simulated environments, but does not provide specific train/validation/test dataset split information. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For the experiments in the following section, we consider a squared exponential covariance function of the form ( )cc0 = exp [ d(c, c0) 2 /l2], with a covariate distance measure d : C C ! R 0 and a length scale l 2 R 0 adapted to the specific modeling scenario. To capture the underling correlations, we used the Euclidean distance between the grid positions as covariate distance measure d and set l to the maximum occurring distance value. Fig. 2c shows the policy model obtained by averaging the predictive action distributions of M = 100 drawn subgoal configurations. |