The Update-Equivalence Framework for Decision-Time Planning

Authors: Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we add further evidence for the update-equivalence framework s utility by showing that the novel DTP algorithms derived from it also perform well in practice. We focus on two settings with imperfect information: i) two variants of Hanabi (Bard et al., 2020), a fully cooperative card game in which PBS-based DTP approaches are considered state-of-the-art; and ii) 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe, 2p0s games with virtually no public information.
Researcher Affiliation Collaboration Samuel Sokota 1 Gabriele Farina 2 David J. Wu Hengyuan Hu3 Kevin A. Wang 4 J. Zico Kolter1,5 Noam Brown 6 Work done at Meta AI 1Carnegie Mellon University 2Massachusetts Institute of Technology 3Stanford University 4Brown University 5Bosch AI 6Open AI
Pseudocode Yes Algorithm 1 Update Equivalent Search for Last-Iterate Algorithm with Action-Value Feedback and Update U
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets Yes We test MDS in Hanabi (Bard et al., 2020), the standard benchmark for search in fully cooperative imperfect-information games.
Dataset Splits No The paper describes training models and using standard benchmarks, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions various software components and frameworks like PPO, DQN, NFSP, and Open Spiel, but it does not provide specific version numbers for these software dependencies (e.g., 'Open Spiel vX.Y.Z').
Experiment Setup Yes For our Hanabi experiments, we used η = 20 for the MDS results in Tables 1 and 2. We performed search with 10,000 samples.