Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Update-Equivalence Framework for Decision-Time Planning
Authors: Samuel Sokota, Gabriele Farina, David J Wu, Hengyuan Hu, Kevin A. Wang, J Zico Kolter, Noam Brown
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we add further evidence for the update-equivalence framework s utility by showing that the novel DTP algorithms derived from it also perform well in practice. We focus on two settings with imperfect information: i) two variants of Hanabi (Bard et al., 2020), a fully cooperative card game in which PBS-based DTP approaches are considered state-of-the-art; and ii) 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe, 2p0s games with virtually no public information. |
| Researcher Affiliation | Collaboration | Samuel Sokota 1 Gabriele Farina 2 David J. Wu Hengyuan Hu3 Kevin A. Wang 4 J. Zico Kolter1,5 Noam Brown 6 Work done at Meta AI 1Carnegie Mellon University 2Massachusetts Institute of Technology 3Stanford University 4Brown University 5Bosch AI 6Open AI |
| Pseudocode | Yes | Algorithm 1 Update Equivalent Search for Last-Iterate Algorithm with Action-Value Feedback and Update U |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We test MDS in Hanabi (Bard et al., 2020), the standard benchmark for search in fully cooperative imperfect-information games. |
| Dataset Splits | No | The paper describes training models and using standard benchmarks, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and frameworks like PPO, DQN, NFSP, and Open Spiel, but it does not provide specific version numbers for these software dependencies (e.g., 'Open Spiel vX.Y.Z'). |
| Experiment Setup | Yes | For our Hanabi experiments, we used η = 20 for the MDS results in Tables 1 and 2. We performed search with 10,000 samples. |