Not a Number: Identifying Instance Features for Capability-Oriented Evaluation
Authors: Ryan Burnell, John Burden, Danaja Rutar, Konstantinos Voudouris, Lucy Cheke, José Hernández-Orallo
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present a new methodology to identify and build informative instance features that can provide explanatory and predictive power to analyse the behaviour of AI systems more robustly. ... We illustrate this methodology with the Animal-AI competition as a representative example of how we can revisit existing competitions and benchmarks in AI even when evaluation data is sparse. |
| Researcher Affiliation | Academia | 1Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK 2Centre for the Study of Existential Risk, University of Cambridge, UK 3VRAIN, Universitat Polit ecnica de Val encia, Spain |
| Pseudocode | No | The paper provides a step-by-step summary of its methodology in Section 6 and Appendix A.4, but these are descriptive text points, not structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The appendix includes further results and plots, and can be found with all the code and data on Github1. 1https://github.com/Ryan Burnell/Not ANumber |
| Open Datasets | Yes | As a proof of concept, we apply this methodology to the Animal-AI (AAI) Olympics [Crosby et al., 2020], a competition that evaluated AI agents in a 3D environment across a range of task categories, such as spatial memory and causal reasoning. |
| Dataset Splits | No | The paper discusses splitting data for analysis ('split the data into 75% test and 25% deployment') and mentions building 'predictive models'. However, it does not provide explicit details about training/validation/test splits for reproducibility of their own predictive models, nor does it refer to standard predefined splits for such a purpose. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used to run the experiments or analyses described. |
| Software Dependencies | No | The paper mentions that the 'Animal-AI (AAI) environment is built in Unity [Juliani et al., 2018]', but it does not specify the version of Unity or any other specific software dependencies with their version numbers used for their analysis or predictive modeling. |
| Experiment Setup | No | The paper describes the setup of the Animal-AI competition tasks and environment, which is the subject of their analysis. However, it does not provide details of the experimental setup for their own research, such as hyperparameters or training configurations for the predictive models they built (e.g., C5.0). |