Configurable Markov Decision Processes
Authors: Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy." and "The experiments are conducted on two explicative domains: the Student-Teacher domain (unconstrained model space) the Racetrack Simulator (parametric model space). |
| Researcher Affiliation | Academia | Alberto Maria Metelli 1 * Mirco Mutti 1 * Marcello Restelli 1 1Politecnico di Milano, 32, Piazza Leonardo da Vinci, Milan, Italy. Correspondence to: Alberto Maria Metelli <albertomaria.metelli@polimi.it>. |
| Pseudocode | Yes | Algorithm 1 Safe Policy Model Iteration initialize π0, P0. for i = 0, 1, 2, ... until ϵ-convergence do πi = Policy Chooser(πi) P i = Model Chooser(Pi) Vi = {(α 0,i, 0), (α 1,i, 1), (0, β 0,i), (1, β 1,i)} α i , β i = arg maxα,β{B(α, β) : (α, β) Vi} πi+1 = α i πi + (1 α i )πi Pi+1 = β i P i + (1 β i )Pi end for |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes custom-built environments ('Student-Teacher domain' and 'Racetrack simulator') rather than using publicly available or open datasets, and no access information is provided for these environments. |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing). The experiments are conducted in simulated environments where data is generated rather than partitioned from a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers). |
| Experiment Setup | No | The paper describes the simulated environments used for experiments but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings. |