Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning

Authors: Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted several experiments to examine whether we can improve the heuristics of the logic solver CADET through our deep reinforcement learning approach.
Researcher Affiliation Collaboration Gil Lederman Electrical Engineering and Computer Sciences University of California at Berkeley gilled@eecs.berkeley.edu Markus N. Rabe Google Research mrabe@google.com Edward A. Lee Electrical Engineering and Computer Sciences University of California at Berkeley eal@eecs.berkeley.edu Sanjit A. Seshia Electrical Engineering and Computer Sciences University of California at Berkeley sseshia@eecs.berkeley.edu
Pseudocode No Not found. The paper describes algorithms and network architectures but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes 2We provide the code and data of our experiments at https://github.com/lederg/learningqbf.
Open Datasets Yes We consider a set of formulas representing the search for reductions between collections of first-order formulas generated by Jordan & Kaiser (2013), which we call Reductions in the following. We additionally consider the 2QBF evaluation set of the annual competition of QBF solvers, QBFEVAL (Pulina, 2016).
Dataset Splits Yes We filtered out 2573 formulas that are solved without any heuristic decisions. In order to enable us to answer question 2 (see above), we further set aside a test set of 200 formulas, leaving us with a training set of 1835 formulas.
Hardware Specification Yes We trained a model on the reduction problems training set for 10M steps on an AWS server of type C5.
Software Dependencies No Not found. The paper mentions the CADET solver and uses terms like GNN, but does not specify software dependencies with version numbers.
Experiment Setup Yes The hyperparameters that resulted in the best model are δL = 16, δC = 64, and τ = 1, leading to a model with merely 8353 parameters. Literal embedding dimension: δL = 16 Clause embedding dimension: δC = 64 Learning rate: 0.0006 for the first 2m steps, then 0.0001 Discount factor: γ = 0.99 Gradient clipping: 2 Number of iterations (size of graph convolution): 1 Minimal number of timesteps per batch: 1200