Learning to Optimize Differentiable Games

Authors: Xuxi Chen, Nelson Vadori, Tianlong Chen, Zhangyang Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory. Codes are available at https: //github.com/VITA-Group/L2PG.
Researcher Affiliation Collaboration 1University of Texas at Austin 2J.P. Morgan AI Research.
Pseudocode Yes A.1. Algorithms We provide a summary to L2PG s pipeline in Algorithm 1. Algorithm 1 L2PG
Open Source Code Yes Codes are available at https: //github.com/VITA-Group/L2PG.
Open Datasets Yes A.2. Sampled Game Coefficients As aforementioned, we have sampled a fixed evaluation set and two testing sets of quadratic games. The coefficients of the 60 games are provided in the three files: evaluation.txt , test stable.txt and test unstable.txt . Each line in the file represents a game, containing 6 numbers that represent M11, M22, M12, M21, b1, b2, respectively.
Dataset Splits Yes We evaluate the L2O optimizer on a fixed set of game instances with the same type (i.e., quadratic or GANs) every 5 epochs, and the optimizer with the highest evaluation performance will be used at the meta-testing stage.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper mentions optimizers like 'Adam optimizer' and 'RMSprop optimizer' and model components like 'LSTM network', but it does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes We use an LSTM optimizer with a hidden dimension of 32 in all experiments. A detailed explanation of the structure of the L2O optimizer can be found in Section B. The unroll length (i.e., the value of T) is set to 10. We batch-ify the training process by simultaneously training on 128 different games, and we train the optimizer for 300 epochs. The number of training iterations in each epoch takes values from {50, 100, 200, 500, 1000} increasingly if the Training-CL technique is applied, otherwise we set the number of training iterations to be 100. We train the parameters in L2PG (i.e., ϕ) by using the Adam optimizer (Kingma & Ba, 2014), with an initial learning rate of 1 10 3. We decay the learning rate by 10 every 1/3 of the total number of training epochs.