Differential Assessment of Black-Box AI Agents
Authors: Rashmeet Kaur Nayyar, Pulkit Verma, Siddharth Srivastava9868-9876
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch. We also show that the cost of differential assessment using our method is proportional to the amount of drift in the agent s functionality. In this section, we evaluate our approach for assessing a black-box agent to learn its model using information about its previous model and available observations. We implemented the algorithm for DAAISy in Python1 and tested it on six planning benchmark domains from the International Planning Competition (IPC) 2. We used the IPC domains as the unknown drifted models and generated six initial domains at random for each domain in our experiments. |
| Researcher Affiliation | Academia | Autonomous Agents and Intelligent Robots Lab, School of Computing and Augmented Intelligence, Arizona State University, AZ, USA {rmnayyar, verma.pulkit, siddharths}@asu.edu |
| Pseudocode | Yes | Algorithm 1: Differential Assessment of AI Systems |
| Open Source Code | Yes | Code available at https://github.com/AAIR-lab/DAAISy |
| Open Datasets | Yes | We used the IPC domains as the unknown drifted models and generated six initial domains at random for each domain in our experiments. International Planning Competition (IPC) 2 |
| Dataset Splits | No | The paper uses IPC domains and generated initial domains but does not specify a clear train/validation/test split for the data used in its experiments in the traditional sense, such as percentages or sample counts for each split. |
| Hardware Specification | Yes | All of our experiments were executed on 5.0 GHz Intel i9 CPUs with 64 GB RAM running Ubuntu 18.04. |
| Software Dependencies | No | The paper mentions implementing the algorithm in Python and using Fast Downward, but does not provide specific version numbers for these or any other key software libraries or dependencies. |
| Experiment Setup | Yes | To generate this set, we gave the agent a random problem instance from the IPC corresponding to the domain used by the agent. The agent then used Fast Downward (Helmert 2006) with LM-Cut heuristic (Helmert and Domshlak 2009) to produce an optimal solution for the given problem. The generated observation trace is provided to DAAISy as input in addition to a random M A init as discussed in Alg. 1. The exact same observation trace is used in all experiments of the same domain, without the knowledge of the drifted model of the agent, and irrespective of the amount of drift. |