Efficient Querying for Cooperative Probabilistic Commitments
Authors: Qi Zhang, Edmund H. Durfee, Satinder Singh11378-11386
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations focus on these questions: For EUS maximization, how effective and efficient is the breakpoints discretization compared with alternatives? For EUS maximization, how effective and efficient is greedy query search compared with exhaustive search? To answer these questions, in Section 6.1, we conduct empirical evaluations in synthetic MDPs with minimal assumptions on the structure of transition and reward functions, and we use an environment in Section 6.2 inspired by the video game of Overcooked to evaluate the breakpoints discretization and the greedy query search in this more grounded and structured domain. |
| Researcher Affiliation | Academia | 1 Artificial Intelligence Institute, University of South Carolina 2 Computer Science and Engineering, University of Michigan |
| Pseudocode | Yes | Algorithm 1: Binary search for breakpoints |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic MDPs and modifying an Overcooked environment for experiments. It does not refer to publicly available datasets with specific access information (links, DOIs, or formal citations to standard benchmarks). |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits in the conventional sense of supervised learning. The experiments are conducted on generated MDP environments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers). |
| Experiment Setup | Yes | The horizon for both agents is set to be 20. The provider’s environment is a randomly-generated MDP. It has 10 states the provider can be in at any time step, one of which is an absorbing state denoted as s+, and where the initial state is chosen from the non-absorbing states. Feature u takes the value of u+ only in the absorbing state, i.e. u+ sp if and only if sp = s+. There are 3 actions. For each state-action pair (sp, ap) where sp = s+, the transition function P p( |sp, ap) is determined independently by filling the 10 entries with values uniformly drawn from [0, 1], and normalizing P p( |sp, ap). The reward Rp(sp, ap) for a non-absorbing state sp = s+ is sampled uniformly and independently from [0, 1], and for the absorbing state sp = s+ is zero. ... The recipient’s environment is a one-dimensional space with 10 locations represented as integers {0, 1, ..., 9}. ... L0 is randomly chosen from locations 1 8 and r0 from interval (0, 10) to create various MDPs for the recipient. |