Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

Authors: Maithra Raghu, Alex Irpan, Jacob Andreas, Bobby Kleinberg, Quoc Le, Jon Kleinberg

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyze multiagent play, and even develop a self play algorithm.
Researcher Affiliation Collaboration 1Google Brain 2Cornell University 3University of California, Berkeley.
Pseudocode Yes Algorithm 1 Self Play with Binary Search
Open Source Code No The paper mentions using 'Open AI Baselines implementations' but does not provide access to its own source code for the methodology described.
Open Datasets Yes We first introduce the family of Attacker-Defender Games (Spencer, 1994), a set of games with two properties that yield a particularly attractive testbed for deep reinforcement learning: the ability to continuously vary the difficulty of the environment through two parameters, and the existence of a closed form solution that is expressible as a linear model.
Dataset Splits No The paper mentions training and testing but does not specify validation data splits or percentages.
Hardware Specification No The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper mentions 'Open AI Baselines implementations' and specific RL algorithms (PPO, A2C, DQN) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We set up the Attacker-Defender environment as follows: the game state is represented by a K + 1 dimensional vector for levels 0 to K, with coordinate l representing the number of pieces at level l. For the defender agent, the input is the concatenation of the partition A, B, giving a 2(K + 1) dimensional vector. The start state S0 is initialized randomly from a distribution over start states of a certain potential.