reproducibilityindex.ai

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Authors: Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments in a n-player market setting where merchant agents buy/sell goods from/to customers. ... In ﬁgure 1 we display calibrator and agents reward evolution during training. It is seen that CALSHEQ outperforms BO in that i) the RL calibrator s rewards converge more smoothly and achieve in average better results in less time
Researcher Affiliation	Industry	Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso J.P. Morgan AI Research {nelson.n.vadori, sumitra.ganesh, prashant.reddy, manuela.veloso}@jpmorgan.com
Pseudocode	Yes	Algorithm 1 (CALSHEQ) Calibration of Shared Equilibria
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	No	The paper describes a custom simulation environment (n-player market setting) and mentions
Dataset Splits	No	The paper mentions training episodes and averaging over them, but does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing instances with specifications).
Software Dependencies	No	The paper mentions that "All policies are trained with PPO [24]" but does not specify version numbers for PPO or any other software dependencies such as Python, TensorFlow, or PyTorch.
Experiment Setup	Yes	We conduct experiments in a n-player market setting... We consider 2 distinct supertypes for 5-10 merchant agents... resulting in 23 parameters to calibrate in total. For each supertype we have i) 10 probabilities to be connected to 10 clusters of 50 customers each (500 customers in total)... All policies are trained with PPO [24], with a KL penalty to control policy updates.