State-Constrained Zero-Sum Differential Games with One-Sided Information

Authors: Mukesh Ghimire, Lei Zhang, Zhe Xu, Yi Ren

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random deceptive moves to take advantage of information asymmetry, and compute how the defender should respond. In Sec. 6, we solve an 8D man-to-man matchup game and reveal player positions in which the attacker can take advantage of information asymmetry by playing specific deceptive moves, and to derive the defender s best response in the lack of information. See Fig. 2.
Researcher Affiliation Academia 1Department of Mechanical and Aerospace Engineering, Arizona State University, Tempe, AZ, USA. Correspondence to: Yi Ren <yiren@asu.edu>.
Pseudocode Yes Alg. 1 summarizes the value approximation algorithm.
Open Source Code Yes The code for the implementation is available at https://github.com/ghimiremukesh/OSIIG.
Open Datasets No The paper describes using a "simplified football game" and sampling states from Q(t) and P to generate data for training. It does not refer to a publicly available, established dataset.
Dataset Splits No The paper mentions training data for value approximation but does not explicitly specify train/validation/test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments. It mentions using "56 CPU cores" for parallel computation but no specific CPU model.
Software Dependencies No The paper mentions software components like "Physics-Informed Neural Network (PINN)", "Adam optimizer", and general references to "Python" or "PyTorch" through code availability, but it does not specify any version numbers for these software dependencies.
Experiment Setup Yes The value network uses PICNN with 5 hidden layers and 256 neurons each and has 9-dimensional inputs (state and belief). We train 10 separate networks for each time step starting from t = 0.9 with τ = 0.1, each being trained for 10 epochs. For each epoch, S(t) includes 5000 states sampled from Q(t). The PINN utilizes a fully-connected network with 3 hidden layers, each comprising 512 neurons with sin activation function. The network adopts the Adam optimizer with a fixed learning rate of 2 10 5.