Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
Authors: Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments in a n-player market setting where merchant agents buy/sell goods from/to customers. ... In figure 1 we display calibrator and agents reward evolution during training. It is seen that CALSHEQ outperforms BO in that i) the RL calibrator s rewards converge more smoothly and achieve in average better results in less time |
| Researcher Affiliation | Industry | Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso J.P. Morgan AI Research {nelson.n.vadori, sumitra.ganesh, prashant.reddy, manuela.veloso}@jpmorgan.com |
| Pseudocode | Yes | Algorithm 1 (CALSHEQ) Calibration of Shared Equilibria |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | No | The paper describes a custom simulation environment (n-player market setting) and mentions |
| Dataset Splits | No | The paper mentions training episodes and averaging over them, but does not provide specific details on dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing instances with specifications). |
| Software Dependencies | No | The paper mentions that "All policies are trained with PPO [24]" but does not specify version numbers for PPO or any other software dependencies such as Python, TensorFlow, or PyTorch. |
| Experiment Setup | Yes | We conduct experiments in a n-player market setting... We consider 2 distinct supertypes for 5-10 merchant agents... resulting in 23 parameters to calibrate in total. For each supertype we have i) 10 probabilities to be connected to 10 clusters of 50 customers each (500 customers in total)... All policies are trained with PPO [24], with a KL penalty to control policy updates. |