Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search
Authors: Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Tomasz Odrzygóźdź, Damian Stachura, Piotr Piękos, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Ada Sub S significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik s Cube, and the inequality-proving benchmark INT. ... We show the effectiveness of Ada Sub S in three challenging domains: Sokoban, Rubik s Cube, and the inequality theorem prover INT (Wu et al., 2021). Ada Sub S significantly surpasses hierarchical planning algorithms and sets a new state-of-the-art on INT. |
| Researcher Affiliation | Collaboration | Michał Zawalski University of Warsaw m.zawalski@uw.edu.pl Michał Tyrolski University of Warsaw michal.tyrolski@ gmail.com Konrad Czechowski University of Warsaw k.czechowski@ mimuw.edu.pl Tomasz Odrzygó zd z IDEAS NCBR tomaszo@impan.pl Damian Stachura Jagiellonian University damian.stachura1@ gmail.com Piotr Pi ekos KAUST piotrpiekos@gmail.com Yuhuai Wu Google Research & Stanford University yuhuai@google.com Łukasz Kuci nski Polish Academy of Sciences lkucinski@impan.pl Piotr Miło s Ideas NCBR, Polish Academy of Sciences, deepsense.ai pmilos@impan.pl |
| Pseudocode | Yes | Algorithm 1 Adaptive Subgoal Search Algorithm 2 Conditional low-level policy Algorithm 3 Verification algorithm Algorithm 4 Low-level path |
| Open Source Code | Yes | The code of our method is available at https://github.com/Adaptive Subgoal Search/adaptive_subs. |
| Open Datasets | Yes | To collect offline trajectories datasets for Rubik s Cube, we generate random paths of length 20 starting from the solved cube and take them in reversed order. For INT we use the generator provided by Wu et al. (2021). For Sokoban, we use the expert data generated by a reinforcement learning agent (Miło s et al., 2019). Detailed information is contained in Appendix D. |
| Dataset Splits | Yes | The components of Ada Sub S are trained using a dataset of offline trajectories of subsequent states and actions: (s0, a0), . . . , (sn 1, an 1), sn. ... We performed the split of dataset D into two parts of equal size: D1 and D2. The former was used to train the subgoal generators and conditional low-level policy, while the latter was used to train the verifier network. |
| Hardware Specification | Yes | We used nodes equipped with a single Nvidia V100 32GB card or Nvidia RTX 2080Ti 11GB card. Each such node had 4 CPU cores and 168GB of RAM. In the latter, we used nodes equipped with Intel Xeon E5-2697 2.60GHz CPU with 28 cores and 128GB RAM. |
| Software Dependencies | No | The paper mentions "m Bart, a transformer from the Hugging Face library" and "Adam optimizer" but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Table 6: Hyperparameters used for training. Table 7: Hyperparameters used for evaluation in the Sokoban environment. Table 8: Hyperparameters used for evaluation in the Rubik s Cube environment. Table 9: Hyperparameters used for evaluation in the INT environment. |