Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
Authors: Maithra Raghu, Alex Irpan, Jacob Andreas, Bobby Kleinberg, Quoc Le, Jon Kleinberg
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyze multiagent play, and even develop a self play algorithm. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Cornell University 3University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Self Play with Binary Search |
| Open Source Code | No | The paper mentions using 'Open AI Baselines implementations' but does not provide access to its own source code for the methodology described. |
| Open Datasets | Yes | We first introduce the family of Attacker-Defender Games (Spencer, 1994), a set of games with two properties that yield a particularly attractive testbed for deep reinforcement learning: the ability to continuously vary the difficulty of the environment through two parameters, and the existence of a closed form solution that is expressible as a linear model. |
| Dataset Splits | No | The paper mentions training and testing but does not specify validation data splits or percentages. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Baselines implementations' and specific RL algorithms (PPO, A2C, DQN) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set up the Attacker-Defender environment as follows: the game state is represented by a K + 1 dimensional vector for levels 0 to K, with coordinate l representing the number of pieces at level l. For the defender agent, the input is the concatenation of the partition A, B, giving a 2(K + 1) dimensional vector. The start state S0 is initialized randomly from a distribution over start states of a certain potential. |