Online Markov Decision Processes Configuration with Continuous Decision Space

Authors: Davide Maran, Pierriccardo Olivieri, Francesco Emanuele Stradi, Giuseppe Urso, Nicola Gatti, Marcello Restelli

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.
Researcher Affiliation Academia Politecnico di Milano, {davide.maran, pierriccardo.olivieri, francescoemanuele.stradi, nicola.gatti, marcello.restelli}@polimi.it, giuseppe.urso@mail.polimi.it
Pseudocode Yes Algorithm 1: Agent-Configurator Interaction ... Algorithm 2: O-DOSC Algorithm ... Algorithm 3: O-SOSC Algorithm
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No The paper mentions 'synthetic experiments' but does not provide concrete access information (link, DOI, specific citation with author/year) for a publicly available dataset, nor does it specify exact dataset splits for training.
Dataset Splits No The paper describes 'synthetic experiments' but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions 'synthetic experiments'.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper states 'For reasons of space, the description of the experimental settings and additional details on the experimental results can be found in the Appendix.' and describes the MDP structure used in experiments (four layers, two states, two actions). However, it does not contain specific hyperparameters, training configurations, or system-level settings within the main text.