Inference and Learning in Dynamic Decision Networks Using Knowledge Compilation
Authors: Gabriele Venturato, Vincent Derkinderen, Pedro Zuidberg Dos Martires, Luc De Raedt
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory. All experiments ran 10 times and we report the average run time. We omit the variance when negligible. The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1. |
| Researcher Affiliation | Academia | 1KU Leuven, Belgium 2 Orebro University, Sweden |
| Pseudocode | Yes | Algorithm 1: Value Iteration with DDCs |
| Open Source Code | Yes | 1https://github.com/ML-KULeuven/mapl-cirup |
| Open Datasets | Yes | To address this question, we evaluate mapl-cirup on three MDP instances of different sizes from the SPUDD repository: elevator, coffee, and factory. We additionally include the monkey instance of Example 1... To generate the dataset we sampled the coffee example enriched with extra reward parameters to make it more challenging, and initialised them with values sampled uniformly from the interval of integers [1, 10]. |
| Dataset Splits | No | The paper describes hyper-parameters for mapl-cirup and training settings for the learning task (e.g., batch size, learning rate), but it does not specify explicit train/validation/test dataset splits or cross-validation details for the primary experiments. |
| Hardware Specification | Yes | We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory. |
| Software Dependencies | No | The paper mentions using Py SDD and Numba packages, and Python3, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1. As a timeout, we use 600s of total run time (indicated by a dashed line on the figures). The Adam optimiser (Kingma and Ba 2015) was used with learning rate = 0.1, = 10 7, and the rest of the parameters set as default. Moreover, we initialised the reward parameters with values sampled uniformly from the interval of integers [ 30, 30]. The dataset contains 100 trajectories (|E| = 100) each of length 5 (k = 5). The training was performed on batches of size 10. |