Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Measuring Goal-Directedness
Authors: Matt MacDermott, James Fox, Francesco Belardinelli, Tom Everitt
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that MEG satisfies several desiderata and demonstrate our algorithms with small-scale experiments 1. and We carried out two experiments5 to measure known-utility MEG with respect to the environment reward function and unknown-utility MEG with respect to a hypothesis class of utility functions. |
| Researcher Affiliation | Collaboration | Matt Mac Dermott Imperial College London James Fox University of Oxford London Initiative for Safe AI Francesco Belardinelli Imperial College London Tom Everitt Google DeepMind |
| Pseudocode | Yes | Algorithm 1 Known-utility MEG in MDPs and Algorithm 2 Unknown-utility MEG in MDPs |
| Open Source Code | Yes | 5Code available at https://github.com/mattmacdermott1/measuring-goal-directedness |
| Open Datasets | Yes | Our experiments measured MEG for various policies in the Cliff World environment from the seals suite [Gleave et al., 2020]. |
| Dataset Splits | No | The paper does not provide specific dataset split information for training, validation, or testing. |
| Hardware Specification | Yes | Hardware model: LENOVO20N2000RUK Processor: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz, 2112 Mhz, 4 Core(s), 8 Logical Processor(s) Memory: 24.0 GB |
| Software Dependencies | No | The paper mentions using ‘SEALS library’ and ‘imitation library’ but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We used an MLP with a single hidden layer of size 256 to define a utility function over states. and considering ε-greedy policies for ε in the range 0.1 to 0.9. |