Benchmarking the Spectrum of Agent Capabilities
Authors: Danijar Hafner
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents. (Abstract) |
| Researcher Affiliation | Collaboration | Danijar Hafner Google Research, Brain Team University of Toronto mail@danijar.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The environment, code for the baseline agents and figures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS) |
| Open Datasets | Yes | The environment, code for the baseline agents and figures in this paper, and the human dataset are available on the project website. https://danijar.com/crafter (Section 4, EXPERIMENTS) |
| Dataset Splits | No | An agent is granted a budget of 1M environment steps to interact with the environment. The agent performance is evaluated through success rates of the individual achievements throughout its training, as well as an aggregated score. (Section 3.3, EVALUATION PROTOCOL) |
| Hardware Specification | No | All agents trained for 1M environment steps in under 24 hours on a single GPU (Section 4.1, BENCHMARK WITH REWARDS) - no specific model or detailed hardware is mentioned. |
| Software Dependencies | Yes | $ python3 -m pip install crafter # Install Crafter $ python3 -m pip install pygame # Needed for human interface (Figure 2) |
| Experiment Setup | No | We used its default hyper parameters for Atari and increased the model size. (Section 4.1, BENCHMARK WITH REWARDS) - specific hyperparameter values are not provided directly in the paper, but referenced to external works or mentioned as |