Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

Authors: Maithra Raghu, Alex Irpan, Jacob Andreas, Bobby Kleinberg, Quoc Le, Jon Kleinberg

ICML 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyze multiagent play, and even develop a self play algorithm.
Researcher Affiliation	Collaboration	1Google Brain 2Cornell University 3University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Self Play with Binary Search
Open Source Code	No	The paper mentions using 'Open AI Baselines implementations' but does not provide access to its own source code for the methodology described.
Open Datasets	Yes	We first introduce the family of Attacker-Defender Games (Spencer, 1994), a set of games with two properties that yield a particularly attractive testbed for deep reinforcement learning: the ability to continuously vary the difficulty of the environment through two parameters, and the existence of a closed form solution that is expressible as a linear model.
Dataset Splits	No	The paper mentions training and testing but does not specify validation data splits or percentages.
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies	No	The paper mentions 'Open AI Baselines implementations' and specific RL algorithms (PPO, A2C, DQN) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We set up the Attacker-Defender environment as follows: the game state is represented by a K + 1 dimensional vector for levels 0 to K, with coordinate l representing the number of pieces at level l. For the defender agent, the input is the concatenation of the partition A, B, giving a 2(K + 1) dimensional vector. The start state S0 is initialized randomly from a distribution over start states of a certain potential.