Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bidirectional Search while Ensuring Meet-In-The-Middle via Effective and Efficient-to-Compute Termination Conditions

Authors: Yi Wang, Eyal Weiss, Bingxian Mu, Oren Salzman

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we compare MEET with A*, MM and BAE*, where the latter was chosen as a representative of the non MM family of Bi-HS algorithms that provides state-of-the-art performance [Alc azar et al., 2020]. Additionally, we conducted an ablation study for MEET to separately assess the impact of using the h heuristic instead of the static h heuristic and the impact of the pruning mechanism. We ran experiments on two distinct domains: grid maps and 10-Pancake puzzles with all algorithms implemented in C++ on a desktop with 16GB RAM and an Intel i7 CPU running Ubuntu 20.04. The implementations of A*, MM, and BAE* were adapted from the source code of [Uras and Koenig, 2015], [Holte et al., 2017], and [Hu and Speck, 2022], respectively. Table 1 displays summarized experimental results for runtime statistics, while Fig. 2, 3, and 4 illustrate, respectively, the distributions of each triggered TC, the average percentages of runtime reduction achieved by each triggered TC vs. the baseline TC1, and detailed time distributions.
Researcher Affiliation Academia Yi Wang1 , Eyal Weiss2 , Bingxian Mu3 and Oren Salzman2 1University of New Hampshire, New Hampshire, USA 2Technion-Israel Institute of Technology, Haifa, Israel 3University of Prince Edward Island, Prince Edward Island, Canada EMAIL, {eweiss@campus,osalzman@cs}.technion.ac.il, EMAIL
Pseudocode Yes Algorithm 1: Generic Framework for MM-Optimal Bi-HS Input: G = (S, E, c), sstart, sgoal Output: Solution π 1 Insert sstart into OPENF and sgoal into OPENB using priority function 2 while both OPENF and OPENB are not empty do 3 Select s OPENF OPENB with min priority function 4 if To Terminate then 5 return Reconstruct Path 6 if s OPENF then 7 Remove s from OPENF 8 Add s to CLOSEDF 9 foreach s nbr(s) do 10 if s can t improve g F(s ) then 11 continue 12 if To Prune(s ) then 13 continue 14 Insert s into OPENF using priority function 16 Analogous operation for the backward search
Open Source Code Yes The code implementation and additional results, including details on state expansions and reduction percentages, can be found in the supplementary material [Wang et al., 2025]. ... [Wang et al., 2025] Yi Wang, Eyal Weiss, Bingxian Mu, and Oren Salzman. Supplementary material for meet-in-the-middle with early and efficient termination . https:// github.com/yi213-robotic/MEET, 2025. Accessed: 2025-09-01.
Open Datasets Yes We evaluate each algorithm on two grid maps from the Moving AI benchmark repository [Sturtevant, 2012], both originating from Dragon Age: Origins (DAO). These maps feature traversable regions that vary significantly across instances, providing substantial variability and ensuring a thorough evaluation of each algorithm s performance. ... Our second domain is the well-known Pancake puzzle: A state in the 10-Pancake puzzle is represented by a vector of 10 different numbers, where the goal is to sort them in ascending order through a sequence of flips from a given start state. In our experiments, we apply both GAP, as described in [Helmert, 2010] and GAP-X heuristics [Holte et al., 2017].
Dataset Splits No The first map, brc203d (274 × 391), has 1290 instances, following [Holte et al., 2017]. The second map, orz100d (412 × 395), contains 2420 instances, and is considered relatively challenging... We conduct experiments on 30 random instances, with optimal solution costs (C*) ranging between 7 and 10.
Hardware Specification Yes We ran experiments on two distinct domains: grid maps and 10-Pancake puzzles with all algorithms implemented in C++ on a desktop with 16GB RAM and an Intel i7 CPU running Ubuntu 20.04.
Software Dependencies No We ran experiments on two distinct domains: grid maps and 10-Pancake puzzles with all algorithms implemented in C++ on a desktop with 16GB RAM and an Intel i7 CPU running Ubuntu 20.04.
Experiment Setup Yes In this section we compare MEET with A*, MM and BAE*, where the latter was chosen as a representative of the non MM family of Bi-HS algorithms that provides state-of-the-art performance [Alc azar et al., 2020].2 Additionally, we conducted an ablation study for MEET to separately assess the impact of using the h heuristic instead of the static h heuristic and the impact of the pruning mechanism. We ran experiments on two distinct domains: grid maps and 10-Pancake puzzles with all algorithms implemented in C++ on a desktop with 16GB RAM and an Intel i7 CPU running Ubuntu 20.04. The implementations of A*, MM, and BAE* were adapted from the source code of [Uras and Koenig, 2015], [Holte et al., 2017], and [Hu and Speck, 2022], respectively. ... State Priority Similar to MM, MEET defines the priority of a state s in OPEND according to its f D-value. However, the heuristic used is not the static heuristic h D but an updated one which we denote by h D. Specifically, and with slight abuse of notation, f D(s) := g D(s) + h D(s) where h D(s) is initialized to be h D(s). In case s / CLOSED D {OPEN D} and g D(s) > h D(s), h D(s) is updated to be g D(s). Otherwise, if s OPEN D, h D(s) is set to be g D(s). ... We evaluate each algorithm on two grid maps from the Moving AI benchmark repository [Sturtevant, 2012], both originating from Dragon Age: Origins (DAO). ... all evaluations use the Octile and Euclidean distance heuristics. ... In our experiments, we apply both GAP, as described in [Helmert, 2010] and GAP-X heuristics [Holte et al., 2017]. The GAP-X heuristics are variants of the original GAP heuristic, devised to be less accurate. GAP-X heuristics exclude gaps that include pancakes among the smallest X in size, where the smallest X in our test dataset is 1. As X increases, GAP-X becomes less accurate. We conduct experiments on 30 random instances, with optimal solution costs (C ) ranging between 7 and 10.