Action Space Reduction for Planning Domains
Authors: Harsha Kokel, Junkyu Lee, Michael Katz, Kavitha Srinivas, Shirin Sohrabi
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show a significant reduction in the action label space size across a wide collection of planning domains. We demonstrate the benefit of our automated label reduction in two separate use cases: improved sample complexity of model-free reinforcement learning algorithms and speeding up successor generation in lifted planning. |
| Researcher Affiliation | Industry | Harsha Kokel , Junkyu Lee , Michael Katz , Kavitha Srinivas and Shirin Sohrabi IBM T.J. Watson Research Center, Yorktown Heights, USA {harsha.kokel, junkyu.lee, michael.katz1, kavitha.srinivas}@ibm.com, ssohrab@us.ibm.com |
| Pseudocode | Yes | Input: A lifted action o with parameters params(o) and a set of relevant lifted mutex groups L. Find: A subset X params(o) of parameters s.t. X1, . . . Xk with (i) X =X1 X2 . . . Xk =params(o), and (ii) Xi+1 =Xi vc(l) for some l L s.t. vf(l) Xi. ... To solve the parameter seed set problem, we cast it as a (delete-free) STRIPS planning task with operation costs. We first find a set L of relevant LMGs. Then, for each lifted action o we define a separate planning task Πo = Lo, Oo, Io, Go , where Language Lo contains a single predicate mark and an object for each parameter in params(o). The set Oo consists of two types of actions 1. seedx actions are defined for each parameter x params(o) as seedx := seedx, log(|D(x)|), , {mark(x)}, 2. getl actions are defined for each relevant LMG l as getl := getl, 0, {mark(x) | x vf(l)}, {mark(y) | y vc(l)}, . Initial state Io = Goal state Go = {mark(x) | x params(o)}. |
| Open Source Code | Yes | The code and supplementary material are available at https://github.com/IBM/Parameter-Seed-Set. |
| Open Datasets | Yes | We compare the size of label sets, obtained with and without the proposed reduction, on a representative set of 14 STRIPS domains from various IPC (using the typed versions where available) and 10 hard-to-ground (HTG) domains. ... We generate 500 unique pairs of initial and goal states in each domain. Of these, 250 pairs were used in training and the remaining were set aside for evaluation. |
| Dataset Splits | No | The paper mentions training and evaluation/test sets but does not explicitly describe a separate validation split for reproducing the data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions using "Fast Downward" and "ACME RL library" but no hardware specifications. |
| Software Dependencies | No | The paper mentions software like "Fast Downward [Helmert, 2006]", "implementation by Fiˇser [2020]", and "ACME RL library [Hoffman et al., 2020]" but does not provide specific version numbers for these software components, which are necessary for reproducible software dependencies. |
| Experiment Setup | No | The paper describes some aspects of the experimental setup, such as using 500 unique initial and goal state pairs for RL experiments, and using h_FF as a dense reward function with Double DQN from ACME. However, it lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations needed for full reproducibility. |