A Variational Perturbative Approach to Planning in Graph-Based Markov Decision Processes

Authors: Dominik Linzner, Heinz Koeppl7203-7210

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.
Researcher Affiliation Academia Dominik Linzner,1 Heinz Koeppl1,2 1Department of Electrical Engineering and Information Technology 2Department of Biology Technische Universit at Darmstadt
Pseudocode Yes Algorithm 1 Stationary points of Euler Lagrange equation; Algorithm 2 Expectation-Maximization for Planning
Open Source Code Yes An implementation of our method is available via Git1. 1https://git.rwth-aachen.de/bcs/vpt-planning
Open Datasets No The paper refers to 'standard GMDP problems' such as 'disease control' and 'forest management' and 'voter model' but does not provide concrete access information (link, DOI, formal citation) to specific datasets used for training.
Dataset Splits No The paper describes problem settings and network topologies (e.g., '2 × 3 grid'), but it does not specify explicit train/validation/test dataset splits or percentages.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions comparing with methods from the 'GMDPtoolbox' but does not list specific version numbers for its own software dependencies like Python libraries or other tools used in its implementation.
Experiment Setup Yes For all experiments, we set the discount factor to γ = 0.9 and the atomic reward r = 1.