reproducibilityindex.ai

Configurable Markov Decision Processes

Authors: Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present the experimental evaluation in two explicative problems to show the beneﬁts of the environment conﬁgurability on the performance of the learned policy." and "The experiments are conducted on two explicative domains: the Student-Teacher domain (unconstrained model space) the Racetrack Simulator (parametric model space).
Researcher Affiliation	Academia	Alberto Maria Metelli 1 * Mirco Mutti 1 * Marcello Restelli 1 1Politecnico di Milano, 32, Piazza Leonardo da Vinci, Milan, Italy. Correspondence to: Alberto Maria Metelli <albertomaria.metelli@polimi.it>.
Pseudocode	Yes	Algorithm 1 Safe Policy Model Iteration initialize π0, P0. for i = 0, 1, 2, ... until ϵ-convergence do πi = Policy Chooser(πi) P i = Model Chooser(Pi) Vi = {(α 0,i, 0), (α 1,i, 1), (0, β 0,i), (1, β 1,i)} α i , β i = arg maxα,β{B(α, β) : (α, β) Vi} πi+1 = α i πi + (1 α i )πi Pi+1 = β i P i + (1 β i )Pi end for
Open Source Code	No	The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper describes custom-built environments ('Student-Teacher domain' and 'Racetrack simulator') rather than using publicly available or open datasets, and no access information is provided for these environments.
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing). The experiments are conducted in simulated environments where data is generated rather than partitioned from a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers).
Experiment Setup	No	The paper describes the simulated environments used for experiments but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.