Levin Tree Search with Context Models
Authors: Laurent Orseau, Marcus Hutter, Levi H. S. Lelis
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The new LTS+CM algorithm compares favorably against LTS+NN on several benchmarks: Sokoban (Boxoban), The Witness, and the 24-Sliding Tile puzzle (STP). The difference is particularly large on STP, where LTS+NN fails to solve most of the test instances while LTS+CM solves each test instance in a fraction of a second. Furthermore, we show that LTS+CM is able to learn a policy that solves the Rubik s cube in only a few hundred expansions, which considerably improves upon previous machine learning techniques. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Department of Computing Science, University of Alberta, Canada and Alberta Machine Intelligence Institute (Amii), Canada |
| Pseudocode | Yes | Algorithm 1 Budgeted LTS with context models. Returns a solution node if any is found, or 'budget_reached' or 'no_solution'. |
| Open Source Code | Yes | Source code: https://github.com/deepmind/levintreesearch cm. |
| Open Datasets | Yes | As in previous work, we train LTS+CM on the same datasets of 50 000 problems each, with the same initial budget (2000 node expansions for Sokoban and The Witness, 7000 for STP) and stop as soon as the training set is entirely solved. Training LTS+CM for these domains took less than 2 hours each. Harder Sokoban. Additionally, we compare algorithms on the Boxoban hard set of 3332 problems. ... Rubik s Cube. We also use LTS+CM to learn a fast policy for the Rubik s cube, with an initial budget of B1 = 21000. We use a sequence of datasets containing 100k problems each, generated with a random walk of between m and m = m + 5 moves from the solution, where m increases by steps of 5 from 0 to 50, after which we set m = m = 50 for each new generated set. |
| Dataset Splits | No | The paper mentions training on '50 000 problems each' and '100k problems each' and testing on a 'separate test set', but does not explicitly define or refer to a validation set, nor does it provide specific percentages or counts for training/validation/test splits. |
| Hardware Specification | Yes | Machine description. We used a single EPYC 7B12 (64 cores, 128 threads) server with 512GB of RAM without GPU. During training and testing, 64 problems are attempted concurrently one problem per CPU core. Optimization uses 128 threads to calculate the loss, Jacobian and updates. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, etc.) for its implementation. It only describes the algorithms and hyperparameters used. |
| Experiment Setup | Yes | Hyperparameters. For all experiments we use εlow = 10 4, εmix = 10 3, a quadratic regularization R(β) = 5 β β0 2 where β0 = (1 1/A) ln εlow (see Appendix F). The convex optimization algorithm we use to solve Eq. (8) is detailed in Appendix C. |