Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
Authors: Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted several experiments to examine whether we can improve the heuristics of the logic solver CADET through our deep reinforcement learning approach. |
| Researcher Affiliation | Collaboration | Gil Lederman Electrical Engineering and Computer Sciences University of California at Berkeley gilled@eecs.berkeley.edu Markus N. Rabe Google Research mrabe@google.com Edward A. Lee Electrical Engineering and Computer Sciences University of California at Berkeley eal@eecs.berkeley.edu Sanjit A. Seshia Electrical Engineering and Computer Sciences University of California at Berkeley sseshia@eecs.berkeley.edu |
| Pseudocode | No | Not found. The paper describes algorithms and network architectures but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 2We provide the code and data of our experiments at https://github.com/lederg/learningqbf. |
| Open Datasets | Yes | We consider a set of formulas representing the search for reductions between collections of first-order formulas generated by Jordan & Kaiser (2013), which we call Reductions in the following. We additionally consider the 2QBF evaluation set of the annual competition of QBF solvers, QBFEVAL (Pulina, 2016). |
| Dataset Splits | Yes | We filtered out 2573 formulas that are solved without any heuristic decisions. In order to enable us to answer question 2 (see above), we further set aside a test set of 200 formulas, leaving us with a training set of 1835 formulas. |
| Hardware Specification | Yes | We trained a model on the reduction problems training set for 10M steps on an AWS server of type C5. |
| Software Dependencies | No | Not found. The paper mentions the CADET solver and uses terms like GNN, but does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | The hyperparameters that resulted in the best model are δL = 16, δC = 64, and τ = 1, leading to a model with merely 8353 parameters. Literal embedding dimension: δL = 16 Clause embedding dimension: δC = 64 Learning rate: 0.0006 for the first 2m steps, then 0.0001 Discount factor: γ = 0.99 Gradient clipping: 2 Number of iterations (size of graph convolution): 1 Minimal number of timesteps per batch: 1200 |