reproducibilityindex.ai

Efficient Querying for Cooperative Probabilistic Commitments

Authors: Qi Zhang, Edmund H. Durfee, Satinder Singh11378-11386

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations focus on these questions: For EUS maximization, how effective and efﬁcient is the breakpoints discretization compared with alternatives? For EUS maximization, how effective and efﬁcient is greedy query search compared with exhaustive search? To answer these questions, in Section 6.1, we conduct empirical evaluations in synthetic MDPs with minimal assumptions on the structure of transition and reward functions, and we use an environment in Section 6.2 inspired by the video game of Overcooked to evaluate the breakpoints discretization and the greedy query search in this more grounded and structured domain.
Researcher Affiliation	Academia	1 Artiﬁcial Intelligence Institute, University of South Carolina 2 Computer Science and Engineering, University of Michigan
Pseudocode	Yes	Algorithm 1: Binary search for breakpoints
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets	No	The paper describes generating synthetic MDPs and modifying an Overcooked environment for experiments. It does not refer to publicly available datasets with specific access information (links, DOIs, or formal citations to standard benchmarks).
Dataset Splits	No	The paper does not specify training, validation, or test dataset splits in the conventional sense of supervised learning. The experiments are conducted on generated MDP environments.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers).
Experiment Setup	Yes	The horizon for both agents is set to be 20. The provider’s environment is a randomly-generated MDP. It has 10 states the provider can be in at any time step, one of which is an absorbing state denoted as s+, and where the initial state is chosen from the non-absorbing states. Feature u takes the value of u+ only in the absorbing state, i.e. u+ sp if and only if sp = s+. There are 3 actions. For each state-action pair (sp, ap) where sp = s+, the transition function P p( \|sp, ap) is determined independently by ﬁlling the 10 entries with values uniformly drawn from [0, 1], and normalizing P p( \|sp, ap). The reward Rp(sp, ap) for a non-absorbing state sp = s+ is sampled uniformly and independently from [0, 1], and for the absorbing state sp = s+ is zero. ... The recipient’s environment is a one-dimensional space with 10 locations represented as integers {0, 1, ..., 9}. ... L0 is randomly chosen from locations 1 8 and r0 from interval (0, 10) to create various MDPs for the recipient.