Efficient Querying for Cooperative Probabilistic Commitments

Authors: Qi Zhang, Edmund H. Durfee, Satinder Singh11378-11386

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluations focus on these questions: For EUS maximization, how effective and efficient is the breakpoints discretization compared with alternatives? For EUS maximization, how effective and efficient is greedy query search compared with exhaustive search? To answer these questions, in Section 6.1, we conduct empirical evaluations in synthetic MDPs with minimal assumptions on the structure of transition and reward functions, and we use an environment in Section 6.2 inspired by the video game of Overcooked to evaluate the breakpoints discretization and the greedy query search in this more grounded and structured domain.
Researcher Affiliation Academia 1 Artificial Intelligence Institute, University of South Carolina 2 Computer Science and Engineering, University of Michigan
Pseudocode Yes Algorithm 1: Binary search for breakpoints
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets No The paper describes generating synthetic MDPs and modifying an Overcooked environment for experiments. It does not refer to publicly available datasets with specific access information (links, DOIs, or formal citations to standard benchmarks).
Dataset Splits No The paper does not specify training, validation, or test dataset splits in the conventional sense of supervised learning. The experiments are conducted on generated MDP environments.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers).
Experiment Setup Yes The horizon for both agents is set to be 20. The provider’s environment is a randomly-generated MDP. It has 10 states the provider can be in at any time step, one of which is an absorbing state denoted as s+, and where the initial state is chosen from the non-absorbing states. Feature u takes the value of u+ only in the absorbing state, i.e. u+ sp if and only if sp = s+. There are 3 actions. For each state-action pair (sp, ap) where sp = s+, the transition function P p( |sp, ap) is determined independently by filling the 10 entries with values uniformly drawn from [0, 1], and normalizing P p( |sp, ap). The reward Rp(sp, ap) for a non-absorbing state sp = s+ is sampled uniformly and independently from [0, 1], and for the absorbing state sp = s+ is zero. ... The recipient’s environment is a one-dimensional space with 10 locations represented as integers {0, 1, ..., 9}. ... L0 is randomly chosen from locations 1 8 and r0 from interval (0, 10) to create various MDPs for the recipient.