Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Counterfactual Online Learning for Open-Loop Monte-Carlo Planning
Authors: Thomy Phan, Shao-Hung Chan, Sven Koenig
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CORAL in four POMDP benchmark scenarios and compare it with closed-loop and open-loop alternatives. |
| Researcher Affiliation | Academia | 1University of Southern California 2University of California, Irvine EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Thompson Sampling; Algorithm 2: CORAL Planning |
| Open Source Code | Yes | Code github.com/thomyphan/counterfactual-planning |
| Open Datasets | Yes | Tag is a gridworld challenge where the agent has to find and tag a target that intentionally moves away. ... Rock Sample[K,L] consists of a K K grid with L rocks (Smith and Simmons 2004). ... Poc Man is a partially observable version of Pac Man (Silver and Veness 2010). |
| Dataset Splits | No | The paper evaluates its method using simulation environments and does not mention static dataset splits (training/test/validation) in the traditional sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and existing implementations (e.g., 'POMCP and POMCPOW implementations are based on the code from (Silver and Veness 2010)', 'open-loop MCTS implementation from (Phan et al. 2019a)') but does not specify software names with version numbers. |
| Experiment Setup | Yes | For each environment, we use γ = 0.95 and the respective preferred action sets Apref (Silver and Veness 2010). ... Unless stated otherwise, we always set η = 50% for CORAL. ... We ran experiments for different simulation budgets nb. For each budget nb, we tested 100 random instances to report the average performance and the 95% confidence interval. |