Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings
Authors: Pulkit Verma, Rushang Karia, Siddharth Srivastava
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented Alg. 1 in Python to evaluate our approach empirically.1 We found that our query synthesis and interactive learning process leads to (i) few shot generalization; (ii) convergence to a sound and complete model; and (iii) much greater sample efficiency and accuracy for learning lifted SDM models with complex capabilities as compared to the baseline. |
| Researcher Affiliation | Academia | Pulkit Verma, Rushang Karia, and Siddharth Srivastava Autonomous Agents and Intelligent Robots Lab, School of Computing and Augmented Intelligence, Arizona State University, AZ, USA {verma.pulkit, rushang.karia, siddharths}@asu.edu |
| Pseudocode | Yes | Algorithm 1: QACE Algorithm |
| Open Source Code | Yes | Source code available at https://github.com/AAIR-lab/QACE |
| Open Datasets | No | The paper describes creating SDMAs and uses terms like "single training problem" and "test set" composed of problems with varying object counts. It refers to simulators and other research systems used, but it does not provide concrete access information (link, DOI, specific citation with authors/year, or mention of a standard public dataset name with proper attribution) for any publicly available or open datasets used for training. |
| Dataset Splits | No | The paper mentions using a "single training problem" and a "test set". While it discusses the generation of "test samples" for evaluating variational distance, it does not specify any training/validation/test dataset splits (e.g., percentages, sample counts, or predefined splits) or cross-validation setup. |
| Hardware Specification | Yes | We ran the experiments on a cluster of Intel Xeon E5-2680 v4 CPUs with Cent OS 7.9 running at 2.4 GHz with a memory limit of 8 GB and a time limit of 4 hours. |
| Software Dependencies | No | The paper mentions implementation in "Python" and use of "Cent OS 7.9" and "PRP [Muise et al., 2012] as the FOND planner". However, it does not provide specific version numbers for Python or PRP, nor does it list other software dependencies with their versions. |
| Experiment Setup | Yes | For QACE, we used α = 2d where d is the maximum depth of policies used in queries generated by QACE and η = 5. |