Action Space Reduction for Planning Domains

Authors: Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning.
Researcher Affiliation Industry Harsha Kokel , Junkyu Lee , Michael Katz , Kavitha Srinivas and Shirin Sohrabi IBM T.J. Watson Research Center, Yorktown Heights, USA {harsha.kokel, junkyu.lee, michael.katz1, kavitha.srinivas}@ibm.com, ssohrab@us.ibm.com
Pseudocode Yes Input: A lifted action o with parameters params(o) and a set of relevant lifted mutex groups L. Find: A subset X params(o) of parameters s.t. X1, . . . Xk with (i) X =X1 X2 . . . Xk =params(o), and (ii) Xi+1 =Xi vc(l) for some l L s.t. vf(l) Xi. ... To solve the parameter seed set problem, we cast it as a (delete-free) STRIPS planning task with operation costs. We first find a set L of relevant LMGs. Then, for each lifted action o we define a separate planning task Πo = Lo, Oo, Io, Go , where Language Lo contains a single predicate mark and an object for each parameter in params(o). The set Oo consists of two types of actions 1. seedx actions are defined for each parameter x params(o) as seedx := seedx, log(|D(x)|), , {mark(x)}, 2. getl actions are defined for each relevant LMG l as getl := getl, 0, {mark(x) | x vf(l)}, {mark(y) | y vc(l)}, . Initial state Io = Goal state Go = {mark(x) | x params(o)}.
Open Source Code Yes The code and supplementary material are available at https://github.com/IBM/Parameter-Seed-Set.
Open Datasets Yes We compare the size of label sets, obtained with and without the proposed reduction, on a representative set of 14 STRIPS domains from various IPC (using the typed versions where available) and 10 hard-to-ground (HTG) domains. ... We generate 500 unique pairs of initial and goal states in each domain. Of these, 250 pairs were used in training and the remaining were set aside for evaluation.
Dataset Splits No The paper mentions training and evaluation/test sets but does not explicitly describe a separate validation split for reproducing the data partitioning.
Hardware Specification No The paper does not explicitly describe specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions using "Fast Downward" and "ACME RL library" but no hardware specifications.
Software Dependencies No The paper mentions software like "Fast Downward [Helmert, 2006]", "implementation by Fiˇser [2020]", and "ACME RL library [Hoffman et al., 2020]" but does not provide specific version numbers for these software components, which are necessary for reproducible software dependencies.
Experiment Setup No The paper describes some aspects of the experimental setup, such as using 500 unique initial and goal state pairs for RL experiments, and using h_FF as a dense reward function with Double DQN from ACME. However, it lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations needed for full reproducibility.