reproducibilityindex.ai

Inference and Learning in Dynamic Decision Networks Using Knowledge Compilation

Authors: Gabriele Venturato, Vincent Derkinderen, Pedro Zuidberg Dos Martires, Luc De Raedt

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory. All experiments ran 10 times and we report the average run time. We omit the variance when negligible. The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1.
Researcher Affiliation	Academia	1KU Leuven, Belgium 2 Orebro University, Sweden
Pseudocode	Yes	Algorithm 1: Value Iteration with DDCs
Open Source Code	Yes	1https://github.com/ML-KULeuven/mapl-cirup
Open Datasets	Yes	To address this question, we evaluate mapl-cirup on three MDP instances of different sizes from the SPUDD repository: elevator, coffee, and factory. We additionally include the monkey instance of Example 1... To generate the dataset we sampled the coffee example enriched with extra reward parameters to make it more challenging, and initialised them with values sampled uniformly from the interval of integers [1, 10].
Dataset Splits	No	The paper describes hyper-parameters for mapl-cirup and training settings for the learning task (e.g., batch size, learning rate), but it does not specify explicit train/validation/test dataset splits or cross-validation details for the primary experiments.
Hardware Specification	Yes	We performed our experimental evaluation with an Intel CPU E3-1225v3 @3.20 GHz and 32 GB of memory.
Software Dependencies	No	The paper mentions using Py SDD and Numba packages, and Python3, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The hyperparameters for mapl-cirup were as follows: discount factor = 0.9 and tolerance = 0.1. As a timeout, we use 600s of total run time (indicated by a dashed line on the figures). The Adam optimiser (Kingma and Ba 2015) was used with learning rate = 0.1, = 10 7, and the rest of the parameters set as default. Moreover, we initialised the reward parameters with values sampled uniformly from the interval of integers [ 30, 30]. The dataset contains 100 trajectories (\|E\| = 100) each of length 5 (k = 5). The training was performed on batches of size 10.