Inference and Learning in Dynamic Decision Networks Using Knowledge Compilation

Authors: Gabriele Venturato, Vincent Derkinderen, Pedro Zuidberg Dos Martires, Luc De Raedt

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory. All experiments ran 10 times and we report the average run time. We omit the variance when negligible. The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1.
Researcher Affiliation Academia 1KU Leuven, Belgium 2 Orebro University, Sweden
Pseudocode Yes Algorithm 1: Value Iteration with DDCs
Open Source Code Yes 1https://github.com/ML-KULeuven/mapl-cirup
Open Datasets Yes To address this question, we evaluate mapl-cirup on three MDP instances of different sizes from the SPUDD repository: elevator, coffee, and factory. We additionally include the monkey instance of Example 1... To generate the dataset we sampled the coffee example enriched with extra reward parameters to make it more challenging, and initialised them with values sampled uniformly from the interval of integers [1, 10].
Dataset Splits No The paper describes hyper-parameters for mapl-cirup and training settings for the learning task (e.g., batch size, learning rate), but it does not specify explicit train/validation/test dataset splits or cross-validation details for the primary experiments.
Hardware Specification Yes We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory.
Software Dependencies No The paper mentions using Py SDD and Numba packages, and Python3, but does not provide specific version numbers for these software components.
Experiment Setup Yes The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1. As a timeout, we use 600s of total run time (indicated by a dashed line on the figures). The Adam optimiser (Kingma and Ba 2015) was used with learning rate = 0.1, = 10 7, and the rest of the parameters set as default. Moreover, we initialised the reward parameters with values sampled uniformly from the interval of integers [ 30, 30]. The dataset contains 100 trajectories (|E| = 100) each of length 5 (k = 5). The training was performed on batches of size 10.