Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Motif s performance and behavior on the challenging, open-ended and procedurally-generated Net Hack game. |
| Researcher Affiliation | Collaboration | 1 Mila, 2 FAIR at Meta, 3 UT Austin, 4 Universit e de Montr eal, 5 Mc Gill University |
| Pseudocode | No | The paper describes its method (Motif) and mentions the use of an RL algorithm (PPO), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/facebookresearch/motif |
| Open Datasets | Yes | To further encourage reproducibility and scientific discoveries, we also release our complete Llama 2 annotations for all experiments. |
| Dataset Splits | Yes | We split the dataset of annotation into a training set containing 80% of the datapoints and a validation set containing 20%. |
| Hardware Specification | Yes | Sample Factory includes a an extremely fast implementation of PPO (Schulman et al., 2017) which runs at about 20K frames-per-second using 20 computer cores and one V 100 GPU. If annotation is done on A100s GPUs the compute costs can be cut approximately in half. |
| Software Dependencies | No | The paper mentions software like "Sample Factory", "Llama 2", and the "vLLM Python module", but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We provide all hyperparemeters in Table 2. |