Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model and Reinforcement Learning for Markov Games with Risk Preferences

Authors: Wenjie Huang, Viet Hai Pham, William Benjamin Haskell2022-2029

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
Researcher Affiliation Academia 1Shenzhen Research Institute of Big Data (SRIBD) 2Institute for Data and Decision Analysis, The Chinese University of Hong Kong, Shenzhen 3Department of Computer Science, School of Computing, National University of Singapore (NUS) 4Supply Chain and Operations Management Area, Krannert School of Management, Purdue University
Pseudocode Yes Algorithm 1 Risk-aware Nash Q-learning
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a direct link to a code repository for the methodology described.
Open Datasets Yes We apply our techniques to the single server exponential queuing system from (Kardes, Ordonez, and Hall 2011).
Dataset Splits No The paper describes a queuing system simulation and presents experimental results, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, counts, or predefined citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using specific algorithms like SASP and discusses an 'interior point algorithm', but it does not list specific software dependencies (e.g., libraries, frameworks, or solvers) with their version numbers that are required to replicate the experiments.
Experiment Setup Yes The state space S represents the maximum number (30 in these experiments) of packets allowed in the system. ... The player s risk preferences are obtained by setting αi for i = 1, 2, and we allow α1 = α2. (Table 2: α1 = α2 = 0.1; Table 3: α1 = 0.95, α2 = 0.1 and α1 = 0.1, α2 = 0.95)