Levin Tree Search with Context Models

Authors: Laurent Orseau, Marcus Hutter, Levi H. S. Lelis

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik s cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques.
Researcher Affiliation Collaboration 1Google Deep Mind 2Department of Computing Science, University of Alberta, Canada and Alberta Machine Intelligence Institute (Amii), Canada
Pseudocode Yes Algorithm 1 Budgeted LTS with context models. Returns a solution node if any is found, or 'budget_reached' or 'no_solution'.
Open Source Code Yes Source code: https://github.com/deepmind/levintreesearch cm.
Open Datasets Yes As in previous work, we train LTS+CM on the same datasets of 50 000 problems each, with the same initial budget (2000 node expansions for Sokoban and The Witness, 7000 for STP) and stop as soon as the training set is entirely solved. Training LTS+CM for these domains took less than 2 hours each. Harder Sokoban. Additionally, we compare algorithms on the Boxoban hard set of 3332 problems. ... Rubik s Cube. We also use LTS+CM to learn a fast policy for the Rubik s cube, with an initial budget of B1 = 21000. We use a sequence of datasets containing 100k problems each, generated with a random walk of between m and m = m + 5 moves from the solution, where m increases by steps of 5 from 0 to 50, after which we set m = m = 50 for each new generated set.
Dataset Splits No The paper mentions training on '50 000 problems each' and '100k problems each' and testing on a 'separate test set', but does not explicitly define or refer to a validation set, nor does it provide specific percentages or counts for training/validation/test splits.
Hardware Specification Yes Machine description. We used a single EPYC 7B12 (64 cores, 128 threads) server with 512GB of RAM without GPU. During training and testing, 64 problems are attempted concurrently one problem per CPU core. Optimization uses 128 threads to calculate the loss, Jacobian and updates.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, etc.) for its implementation. It only describes the algorithms and hyperparameters used.
Experiment Setup Yes Hyperparameters. For all experiments we use εlow = 10 4, εmix = 10 3, a quadratic regularization R(β) = 5 β β0 2 where β0 = (1 1/A) ln εlow (see Appendix F). The convex optimization algorithm we use to solve Eq. (8) is detailed in Appendix C.