Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

Authors: Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Tomasz Odrzygóźdź, Damian Stachura, Piotr Piękos, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Ada Sub S significantly surpasses hierarchical planning algorithms on three complex reasoning tasks: Sokoban, the Rubik s Cube, and the inequality-proving benchmark INT. ... We show the effectiveness of Ada Sub S in three challenging domains: Sokoban, Rubik s Cube, and the inequality theorem prover INT (Wu et al., 2021). Ada Sub S significantly surpasses hierarchical planning algorithms and sets a new state-of-the-art on INT.
Researcher Affiliation Collaboration Michał Zawalski University of Warsaw m.zawalski@uw.edu.pl Michał Tyrolski University of Warsaw michal.tyrolski@ gmail.com Konrad Czechowski University of Warsaw k.czechowski@ mimuw.edu.pl Tomasz Odrzygó zd z IDEAS NCBR tomaszo@impan.pl Damian Stachura Jagiellonian University damian.stachura1@ gmail.com Piotr Pi ekos KAUST piotrpiekos@gmail.com Yuhuai Wu Google Research & Stanford University yuhuai@google.com Łukasz Kuci nski Polish Academy of Sciences lkucinski@impan.pl Piotr Miło s Ideas NCBR, Polish Academy of Sciences, deepsense.ai pmilos@impan.pl
Pseudocode Yes Algorithm 1 Adaptive Subgoal Search Algorithm 2 Conditional low-level policy Algorithm 3 Verification algorithm Algorithm 4 Low-level path
Open Source Code Yes The code of our method is available at https://github.com/Adaptive Subgoal Search/adaptive_subs.
Open Datasets Yes To collect offline trajectories datasets for Rubik s Cube, we generate random paths of length 20 starting from the solved cube and take them in reversed order. For INT we use the generator provided by Wu et al. (2021). For Sokoban, we use the expert data generated by a reinforcement learning agent (Miło s et al., 2019). Detailed information is contained in Appendix D.
Dataset Splits Yes The components of Ada Sub S are trained using a dataset of offline trajectories of subsequent states and actions: (s0, a0), . . . , (sn 1, an 1), sn. ... We performed the split of dataset D into two parts of equal size: D1 and D2. The former was used to train the subgoal generators and conditional low-level policy, while the latter was used to train the verifier network.
Hardware Specification Yes We used nodes equipped with a single Nvidia V100 32GB card or Nvidia RTX 2080Ti 11GB card. Each such node had 4 CPU cores and 168GB of RAM. In the latter, we used nodes equipped with Intel Xeon E5-2697 2.60GHz CPU with 28 cores and 128GB RAM.
Software Dependencies No The paper mentions "m Bart, a transformer from the Hugging Face library" and "Adam optimizer" but does not specify version numbers for these software components or any other libraries.
Experiment Setup Yes Table 6: Hyperparameters used for training. Table 7: Hyperparameters used for evaluation in the Sokoban environment. Table 8: Hyperparameters used for evaluation in the Rubik s Cube environment. Table 9: Hyperparameters used for evaluation in the INT environment.