reproducibilityindex.ai

Model and Reinforcement Learning for Markov Games with Risk Preferences

Authors: Wenjie Huang, Viet Hai Pham, William Benjamin Haskell2022-2029

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
Researcher Affiliation	Academia	1Shenzhen Research Institute of Big Data (SRIBD) 2Institute for Data and Decision Analysis, The Chinese University of Hong Kong, Shenzhen 3Department of Computer Science, School of Computing, National University of Singapore (NUS) 4Supply Chain and Operations Management Area, Krannert School of Management, Purdue University
Pseudocode	Yes	Algorithm 1 Risk-aware Nash Q-learning
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a direct link to a code repository for the methodology described.
Open Datasets	Yes	We apply our techniques to the single server exponential queuing system from (Kardes, Ordonez, and Hall 2011).
Dataset Splits	No	The paper describes a queuing system simulation and presents experimental results, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, counts, or predefined citations.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using specific algorithms like SASP and discusses an 'interior point algorithm', but it does not list specific software dependencies (e.g., libraries, frameworks, or solvers) with their version numbers that are required to replicate the experiments.
Experiment Setup	Yes	The state space S represents the maximum number (30 in these experiments) of packets allowed in the system. ... The player s risk preferences are obtained by setting αi for i = 1, 2, and we allow α1 = α2. (Table 2: α1 = α2 = 0.1; Table 3: α1 = 0.95, α2 = 0.1 and α1 = 0.1, α2 = 0.95)