reproducibilityindex.ai

Benchmarking the Spectrum of Agent Capabilities

Authors: Danijar Hafner

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally verify that Crafter is of appropriate difﬁculty to drive future research and provide baselines scores of reward agents and unsupervised agents. (Abstract)
Researcher Affiliation	Collaboration	Danijar Hafner Google Research, Brain Team University of Toronto mail@danijar.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The environment, code for the baseline agents and ﬁgures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS)
Open Datasets	Yes	The environment, code for the baseline agents and ﬁgures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS)
Dataset Splits	No	An agent is granted a budget of 1M environment steps to interact with the environment. The agent performance is evaluated through success rates of the individual achievements throughout its training, as well as an aggregated score. (Section 3.3, EVALUATION PROTOCOL)
Hardware Specification	No	All agents trained for 1M environment steps in under 24 hours on a single GPU (Section 4.1, BENCHMARK WITH REWARDS) - no specific model or detailed hardware is mentioned.
Software Dependencies	Yes	$ python3 -m pip install crafter # Install Crafter $ python3 -m pip install pygame # Needed for human interface (Figure 2)
Experiment Setup	No	We used its default hyper parameters for Atari and increased the model size. (Section 4.1, BENCHMARK WITH REWARDS) - specific hyperparameter values are not provided directly in the paper, but referenced to external works or mentioned as