Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Abstraction for Bayesian Reinforcement Learning in Factored POMDPs
Authors: Rolf A. N. Starre, Sammie Katt, Mustafa Mert Çelikok, Marco Loog, Frans A Oliehoek
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate two key benefits. First, abstraction reduces model size, enabling faster simulations and thus more planning simulations within a fixed runtime. Second, abstraction enhances performance even with a fixed number of simulations due to greater statistical strength. These results underscore the potential of abstraction to improve both the scalability and effectiveness of Bayesian reinforcement learning in factored POMDPs. |
| Researcher Affiliation | Academia | Rolf A. N. Starre EMAIL Delft University of Technology Sammie Katt EMAIL Aalto University Mustafa Mert Çelikok EMAIL Delft University of Technology Marco Loog EMAIL Radbout University Frans A. Oliehoek EMAIL Delft University of Technology |
| Pseudocode | Yes | Algorithm 1 Initialize Abstract Particle Filter... Algorithm 2 Abstract... Algorithm 3 SIS with Abstraction... Algorithm 4 Get Subset K... Algorithm 5 Sequential Importance Sampling... Algorithm 6 Initialize Particle Filter... Algorithm 7 FBA-POMCP... Algorithm 8 Simulate... Algorithm 9 Step (with abstraction) |
| Open Source Code | No | The software is written in C++. The paper does not provide a link to the source code or an explicit statement about its public release. |
| Open Datasets | No | The paper uses custom-designed environments described within the text, such as the "Corridor domain," "Cracky Pavement Gridworld," "Collision Avoidance," and "Room Configuration." It does not refer to or provide access information for any external, publicly available datasets. |
| Dataset Splits | No | The paper describes experiments conducted within custom-designed simulation environments over a certain number of 'episodes' and 'runs,' but it does not utilize or specify traditional training/test/validation splits of a dataset, as is common in supervised learning contexts. Therefore, specific dataset split information is not applicable or provided. |
| Hardware Specification | Yes | We performed the experiments with a fixed number of simulations on three different machines: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz with 384GB RAM, Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz with 190GB RAM, and AMD EPYC 7452 32-Core Processor CPU @ 2.0GHz with 256GB RAM. For the experiments with a fixed amount of computation time, we used (2 cores of) an AMD EPYC 7452 32-Core Processor CPU @ 1.5GHz with 512GB RAM. |
| Software Dependencies | No | The paper states that "The software is written in C++" but does not specify any libraries, frameworks, or their version numbers that were used in the implementation. |
| Experiment Setup | Yes | The settings of the experiments, and some specifics of the environments, are detailed in table 6. In the table, γ denotes the discount factor, the UCT constant is the exploration constant, and the log-likelihood threshold is the threshold below which reinvigoration is triggered. Table 6: Fixed experiment settings. Parameter Corridor Cracky Pavement Collision Room Conf γ 0.95 0.95 0.95 0.95 # of particles in belief 500 500 500 10000 # of episodes 100 500 500 50 # of runs 100 100 10000 1000 Horizon (H) 20 12 20 13 UCT constant 5 1 500 10 Reinvigoration Yes No No Yes Log-likelihood threshold -1500 N/A N/A -500 State factors 5 [23, 83] 7 15 |S| 320 [5 107, 6 1025] 6000 196608 |