Model and Reinforcement Learning for Markov Games with Risk Preferences

Authors: Wenjie Huang, Viet Hai Pham, William Benjamin Haskell2022-2029

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
Researcher Affiliation Academia 1Shenzhen Research Institute of Big Data (SRIBD) 2Institute for Data and Decision Analysis, The Chinese University of Hong Kong, Shenzhen 3Department of Computer Science, School of Computing, National University of Singapore (NUS) 4Supply Chain and Operations Management Area, Krannert School of Management, Purdue University
Pseudocode Yes Algorithm 1 Risk-aware Nash Q-learning
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a direct link to a code repository for the methodology described.
Open Datasets Yes We apply our techniques to the single server exponential queuing system from (Kardes, Ordonez, and Hall 2011).
Dataset Splits No The paper describes a queuing system simulation and presents experimental results, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, counts, or predefined citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using specific algorithms like SASP and discusses an 'interior point algorithm', but it does not list specific software dependencies (e.g., libraries, frameworks, or solvers) with their version numbers that are required to replicate the experiments.
Experiment Setup Yes The state space S represents the maximum number (30 in these experiments) of packets allowed in the system. ... The player s risk preferences are obtained by setting αi for i = 1, 2, and we allow α1 = α2. (Table 2: α1 = α2 = 0.1; Table 3: α1 = 0.95, α2 = 0.1 and α1 = 0.1, α2 = 0.95)