Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interactive Fiction Games: A Colossal Adventure

Authors: Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan7903-7910

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the agents across a set of thirty-two Jericho-supported games with the aims of 1) showing the feasibility of reinforcement learning on a variety of different IF games, 2) creating a reproducible benchmark for future work, 3) investigating the difference between choice-based and template-based action spaces, and 4) comparing performance of general IF game playing agents (NAIL), single-game agents (DRRN and TDQN), and a random agent (RAND) which uniformly sample commands from a set of canonical actions.
Researcher Affiliation	Collaboration	Matthew Hausknecht Microsoft Research AI Prithviraj Ammanabrolu Georgia Institute of Technology Marc-Alexandre Cˆot e Microsoft Research Montr eal Xingdi Yuan Microsoft Research Montr eal
Pseudocode	Yes	Algorithm 1 Procedure for Identifying Valid Actions
Open Source Code	Yes	Jericho is available at https://github.com/microsoft/jericho.
Open Datasets	Yes	Jericho supports a set of ﬁfty-six human-made IF games that cover a variety of genres... There exists a large collection of over a thousand unsupported games 3, which may be useful for unsupervised pretraining or intrinsic motivation. 3https://github.com/BYU-PCCL/z-machine-games
Dataset Splits	No	The paper mentions 'Additional experiment details and hyperparameters are located in the supplementary material' but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) in the main text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Python-based IF environment', 'Sentence Piece model', and 'GRU encoders' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper states 'Additional experiment details and hyperparameters are located in the supplementary material' but does not include specific experimental setup details (e.g., concrete hyperparameter values, training configurations, or system-level settings) within the main text.