The Update-Equivalence Framework for Decision-Time Planning
Authors: Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we add further evidence for the update-equivalence framework s utility by showing that the novel DTP algorithms derived from it also perform well in practice. We focus on two settings with imperfect information: i) two variants of Hanabi (Bard et al., 2020), a fully cooperative card game in which PBS-based DTP approaches are considered state-of-the-art; and ii) 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe, 2p0s games with virtually no public information. |
| Researcher Affiliation | Collaboration | Samuel Sokota 1 Gabriele Farina 2 David J. Wu Hengyuan Hu3 Kevin A. Wang 4 J. Zico Kolter1,5 Noam Brown 6 Work done at Meta AI 1Carnegie Mellon University 2Massachusetts Institute of Technology 3Stanford University 4Brown University 5Bosch AI 6Open AI |
| Pseudocode | Yes | Algorithm 1 Update Equivalent Search for Last-Iterate Algorithm with Action-Value Feedback and Update U |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We test MDS in Hanabi (Bard et al., 2020), the standard benchmark for search in fully cooperative imperfect-information games. |
| Dataset Splits | No | The paper describes training models and using standard benchmarks, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and frameworks like PPO, DQN, NFSP, and Open Spiel, but it does not provide specific version numbers for these software dependencies (e.g., 'Open Spiel vX.Y.Z'). |
| Experiment Setup | Yes | For our Hanabi experiments, we used η = 20 for the MDS results in Tables 1 and 2. We performed search with 10,000 samples. |