Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
Authors: Maithra Raghu, Alex Irpan, Jacob Andreas, Bobby Kleinberg, Quoc Le, Jon Kleinberg
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyze multiagent play, and even develop a self play algorithm. |
| Researcher Affiliation | Collaboration | 1Google Brain 2Cornell University 3University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Self Play with Binary Search |
| Open Source Code | No | The paper mentions using 'Open AI Baselines implementations' but does not provide access to its own source code for the methodology described. |
| Open Datasets | Yes | We first introduce the family of Attacker-Defender Games (Spencer, 1994), a set of games with two properties that yield a particularly attractive testbed for deep reinforcement learning: the ability to continuously vary the difficulty of the environment through two parameters, and the existence of a closed form solution that is expressible as a linear model. |
| Dataset Splits | No | The paper mentions training and testing but does not specify validation data splits or percentages. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Baselines implementations' and specific RL algorithms (PPO, A2C, DQN) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set up the Attacker-Defender environment as follows: the game state is represented by a K + 1 dimensional vector for levels 0 to K, with coordinate l representing the number of pieces at level l. For the defender agent, the input is the concatenation of the partition A, B, giving a 2(K + 1) dimensional vector. The start state S0 is initialized randomly from a distribution over start states of a certain potential. |