Learning to Optimize Differentiable Games
Authors: Xuxi Chen, Nelson Vadori, Tianlong Chen, Zhangyang Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory. Codes are available at https: //github.com/VITA-Group/L2PG. |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin 2J.P. Morgan AI Research. |
| Pseudocode | Yes | A.1. Algorithms We provide a summary to L2PG s pipeline in Algorithm 1. Algorithm 1 L2PG |
| Open Source Code | Yes | Codes are available at https: //github.com/VITA-Group/L2PG. |
| Open Datasets | Yes | A.2. Sampled Game Coefficients As aforementioned, we have sampled a fixed evaluation set and two testing sets of quadratic games. The coefficients of the 60 games are provided in the three files: evaluation.txt , test stable.txt and test unstable.txt . Each line in the file represents a game, containing 6 numbers that represent M11, M22, M12, M21, b1, b2, respectively. |
| Dataset Splits | Yes | We evaluate the L2O optimizer on a fixed set of game instances with the same type (i.e., quadratic or GANs) every 5 epochs, and the optimizer with the highest evaluation performance will be used at the meta-testing stage. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam optimizer' and 'RMSprop optimizer' and model components like 'LSTM network', but it does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We use an LSTM optimizer with a hidden dimension of 32 in all experiments. A detailed explanation of the structure of the L2O optimizer can be found in Section B. The unroll length (i.e., the value of T) is set to 10. We batch-ify the training process by simultaneously training on 128 different games, and we train the optimizer for 300 epochs. The number of training iterations in each epoch takes values from {50, 100, 200, 500, 1000} increasingly if the Training-CL technique is applied, otherwise we set the number of training iterations to be 100. We train the parameters in L2PG (i.e., ϕ) by using the Adam optimizer (Kingma & Ba, 2014), with an initial learning rate of 1 10 3. We decay the learning rate by 10 every 1/3 of the total number of training epochs. |