Benchmarking the Spectrum of Agent Capabilities

Authors: Danijar Hafner

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents. (Abstract)
Researcher Affiliation Collaboration Danijar Hafner Google Research, Brain Team University of Toronto mail@danijar.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The environment, code for the baseline agents and figures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS)
Open Datasets Yes The environment, code for the baseline agents and figures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS)
Dataset Splits No An agent is granted a budget of 1M environment steps to interact with the environment. The agent performance is evaluated through success rates of the individual achievements throughout its training, as well as an aggregated score. (Section 3.3, EVALUATION PROTOCOL)
Hardware Specification No All agents trained for 1M environment steps in under 24 hours on a single GPU (Section 4.1, BENCHMARK WITH REWARDS) - no specific model or detailed hardware is mentioned.
Software Dependencies Yes $ python3 -m pip install crafter # Install Crafter $ python3 -m pip install pygame # Needed for human interface (Figure 2)
Experiment Setup No We used its default hyper parameters for Atari and increased the model size. (Section 4.1, BENCHMARK WITH REWARDS) - specific hyperparameter values are not provided directly in the paper, but referenced to external works or mentioned as