Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Authors: Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments in a n-player market setting where merchant agents buy/sell goods from/to customers. ... In figure 1 we display calibrator and agents reward evolution during training. It is seen that CALSHEQ outperforms BO in that i) the RL calibrator s rewards converge more smoothly and achieve in average better results in less time
Researcher Affiliation Industry Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso J.P. Morgan AI Research {nelson.n.vadori, sumitra.ganesh, prashant.reddy, manuela.veloso}@jpmorgan.com
Pseudocode Yes Algorithm 1 (CALSHEQ) Calibration of Shared Equilibria
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets No The paper describes a custom simulation environment (n-player market setting) and mentions
Dataset Splits No The paper mentions training episodes and averaging over them, but does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing instances with specifications).
Software Dependencies No The paper mentions that "All policies are trained with PPO [24]" but does not specify version numbers for PPO or any other software dependencies such as Python, TensorFlow, or PyTorch.
Experiment Setup Yes We conduct experiments in a n-player market setting... We consider 2 distinct supertypes for 5-10 merchant agents... resulting in 23 parameters to calibrate in total. For each supertype we have i) 10 probabilities to be connected to 10 clusters of 50 customers each (500 customers in total)... All policies are trained with PPO [24], with a KL penalty to control policy updates.