Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs
Authors: Frits de Nijs, Erwin Walraven, Mathijs de Weerdt, Matthijs Spaan
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners. |
| Researcher Affiliation | Academia | Frits de Nijs, Erwin Walraven, Mathijs M. de Weerdt, Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands |
| Pseudocode | Yes | More details and pseudocode are provided in the supplement, which is available on the homepage of the authors. |
| Open Source Code | No | The paper states 'More details and pseudocode are provided in the supplement, which is available on the homepage of the authors.' and 'See authors homepages for supplementary material on integrating pruning in Column Generation and on the Lottery domain.' This refers to a general homepage, not a direct link to a source-code repository for the methodology described in the paper. |
| Open Datasets | No | The paper uses problem domains such as the 'Lottery problem', 'TCL problem (De Nijs, Spaan, and De Weerdt 2015)', 'Mars rover Maze (Wu and Durfee 2010)', and 'synthetic advertising domain presented by Boutilier and Lu (2016)'. These are references to problem setups or prior work defining domains, not concrete publicly available datasets with access information or formal citations for data access. |
| Dataset Splits | No | The paper does not provide specific training, validation, and test dataset splits (e.g., percentages, sample counts, or explicit cross-validation methodology) for its experiments. It refers to problem domains rather than specific datasets with predefined splits. |
| Hardware Specification | Yes | All LPs were solved using Gurobi 7.0 on a 2.1Ghz quad-core i7. |
| Software Dependencies | Yes | All LPs were solved using Gurobi 7.0 on a 2.1Ghz quad-core i7. |
| Experiment Setup | Yes | Frequency of violations is observed through 500,000 Monte Carlo trials. Parameter β was determined to fit the domain, ranging from 3 (advertising) to 100 (lottery). Each TCL agent has 24 states and 2 actions, horizon 24 and 1 resource type (24 resources in total). Our Maze problems have 26 states and 10 actions per agent, horizon 15, and 3 resource types (resulting in 45 resource constraints). |