Learning in Non-Cooperative Configurable Markov Decision Processes
Authors: Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we empirically validate the performance of our algorithm in simulated domains.In this section, we provide the experimental evaluation of our algorithms in two different settings: when the policies are stochastic and when the policies are deterministic. |
| Researcher Affiliation | Academia | Giorgia Ramponi ETH AI Center Zurich, Switzerland gramponi@ethz.ch Alberto Maria Metelli Politecnico di Milano Milan, Italy albertomaria.metelli@polimi.it Alessandro Concetti Politecnico di Milano Milan, Italy alessandro.concetti@mail.polimi.it Marcello Restelli Politecnico di Milano Milan, Italy marcello.restelli@polimi.it |
| Pseudocode | Yes | Algorithm 1 Action-feedback Optimistic Configuration Learning (Af OCL). and Algorithm 2 Reward-feedback Optimistic Configuration Learning (Rf OCL) |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplementary material. |
| Open Datasets | No | The paper introduces “two novel environments: Configurable Gridworld and the Student-Teacher.” These are described as simulation environments designed for the experiments, not external datasets with specific public access information provided in the paper. |
| Dataset Splits | No | The paper describes experiments in simulated environments and evaluates performance based on cumulative regret over episodes. It does not mention explicit training, validation, or test dataset splits. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] |
| Software Dependencies | No | The paper describes algorithms and concepts (e.g., UCB1 algorithm, value iteration) but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers). |
| Experiment Setup | Yes | In the first experiment (Figure 1), we considered 10 and 30 configurations with a number of episodes K = 2000 and K = 4000 and horizon H = 10. For this experiment, the agent plays optimal stochastic policies.and the results with M 2 {40, 60, 100} and horizon H = 10 are shown.50 runs, 98% c.i. |