reproducibilityindex.ai

Differential Assessment of Black-Box AI Agents

Authors: Rashmeet Kaur Nayyar, Pulkit Verma, Siddharth Srivastava9868-9876

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch. We also show that the cost of differential assessment using our method is proportional to the amount of drift in the agent s functionality. In this section, we evaluate our approach for assessing a black-box agent to learn its model using information about its previous model and available observations. We implemented the algorithm for DAAISy in Python1 and tested it on six planning benchmark domains from the International Planning Competition (IPC) 2. We used the IPC domains as the unknown drifted models and generated six initial domains at random for each domain in our experiments.
Researcher Affiliation	Academia	Autonomous Agents and Intelligent Robots Lab, School of Computing and Augmented Intelligence, Arizona State University, AZ, USA {rmnayyar, verma.pulkit, siddharths}@asu.edu
Pseudocode	Yes	Algorithm 1: Differential Assessment of AI Systems
Open Source Code	Yes	Code available at https://github.com/AAIR-lab/DAAISy
Open Datasets	Yes	We used the IPC domains as the unknown drifted models and generated six initial domains at random for each domain in our experiments. International Planning Competition (IPC) 2
Dataset Splits	No	The paper uses IPC domains and generated initial domains but does not specify a clear train/validation/test split for the data used in its experiments in the traditional sense, such as percentages or sample counts for each split.
Hardware Specification	Yes	All of our experiments were executed on 5.0 GHz Intel i9 CPUs with 64 GB RAM running Ubuntu 18.04.
Software Dependencies	No	The paper mentions implementing the algorithm in Python and using Fast Downward, but does not provide specific version numbers for these or any other key software libraries or dependencies.
Experiment Setup	Yes	To generate this set, we gave the agent a random problem instance from the IPC corresponding to the domain used by the agent. The agent then used Fast Downward (Helmert 2006) with LM-Cut heuristic (Helmert and Domshlak 2009) to produce an optimal solution for the given problem. The generated observation trace is provided to DAAISy as input in addition to a random M A init as discussed in Alg. 1. The exact same observation trace is used in all experiments of the same domain, without the knowledge of the drifted model of the agent, and irrespective of the amount of drift.