Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning
Authors: Shumao Zhai, Ning Ge
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations show that RDC-SAT significantly improves the performance of D&C solvers on phase transition random 3-SAT datasets and generalizes well to the SAT Competition 2023 dataset, substantially outperforming traditional splitting heuristics. |
| Researcher Affiliation | Academia | Shumao Zhai1, Ning Ge1,2 1Beihang University 2State Key Laboratory of Complex & Critical Software Environment (CCSE) Corresponding author: Ning Ge (EMAIL) |
| Pseudocode | Yes | Algorithm 1: Painless Solver::step Input: action Output: state,done |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Our training dataset comes from the dataset provided by the study(Cameron et al., 2020). We first screened 400 problems from the SAT Competition 2023 (Balyo et al., 2023) dataset and selected those with a total number of variables and clauses less than one million, resulting in 305 SAT problems. |
| Dataset Splits | No | Our training dataset comes from the dataset provided by the study(Cameron et al., 2020) . This study provides 11 datasets ranging from 100 to 600 variables, each containing 5000 satisfiable (SAT) and 5000 unsatisfiable (UNSAT) problems. We randomly selected 500 SAT and 500 UNSAT problems from the datasets with 300 and 350 variables as our training dataset. To investigate whether RDC-SAT has effectively learned the splitting heuristics, we randomly selected 200 SAT instances from the random dataset at the phase transition. This set includes 100 problems from the 450-1917 dataset and 100 from the 500-2129 dataset, with a 1:1 ratio of satisfiable to unsatisfiable instances. The paper specifies the composition of the training and evaluation sets (e.g., 500 SAT and 500 UNSAT for training, 1:1 ratio for evaluation) but does not provide specific train/test/validation split percentages or absolute counts within a single dataset for reproducibility of the exact splits. |
| Hardware Specification | Yes | This training operation is hosted on a Linux server equipped with an Intel(R) Xeon(R) Platinum 8373C CPU, featuring 72 physical cores and 144 logical cores, supplemented by 256 GB of memory and a Tesla A100 GPU with 80 GB of memory. |
| Software Dependencies | No | We implemented our neural network using Py Torch(Paszke et al., 2019) and Py Torch Geometric(Py G)(Fey & Lenssen, 2019). The paper mentions Py Torch and Py Torch Geometric with citations to their original papers, but does not provide specific version numbers for these software dependencies as required for reproducibility. |
| Experiment Setup | Yes | Table 2 shows hyperparameters were used in our experiments: Parameter Value Description Painless Env Configuration timeout 1000 Maximum time (in seconds) allowed for exploration agent ncpus 16 The number of threads used by each exploration agent use_learnt 2 In addition to the original clause, two levels of learning clauses, core and tire2, are used to build the graph node_feats 4 Number of node features edge_feats 2 Number of edge features global_feats 4 Number of global features use_LRB true Use LRB as branch heuristic Network Configuration encoder_out_dims (32, 32, 32) Dimensions for the encoder outputs (node, edge, global) core_out_dims (64, 64, 32) Dimensions for the core outputs (node, edge, global) decoder_out_dims (32, 1, 32) Dimensions for the decoder outputs (node, edge, global) core_steps 4 Number of processing steps in the Processor n_hidden 1 Number of hidden layers in Processor s MLP hidden_size 64 Size of hidden layers in GNN s MLP e2v_aggregator sum + BN Aggregation method for edge to vertex. After sum aggregation, we use batch normalization(Ioffe & Szegedy, 2015) to normalize the features independent_block_layers 0 Number of hidden layer of Encoder and Decoder n_hidden_actor 2 Number of hidden layer of actor s MLP n_hidden_critic 2 Number of hidden layer of critic s MLP activationfunction Leaky Relu(Maas et al., 2013) Activation function unsed in MLP DPPO Configuration optimizer Adam(Kingma & Ba, 2014) Optimizer used lr_gnn 1e-3 Learning rate for the GNN lr_actor 1e-3 Learning rate for the actor network lr_critic 1e-3 Learning rate for the critic network policy_loss_weight 1.0 Weight of the policy loss value_loss_weight 0.5 Weight of the value loss entropy_loss_weight 0.1 Weight of the entropy loss eps_clip 0.2 Clipping parameter for PPO epsilon_start 1.0 Starting value for epsilon in epsilon-greedy exploration epsilon_final 0.01 Final value for epsilon in epsilon-greedy exploration epsilon_decay 200 Decay rate for epsilon normalize_reward True Whether to normalize rewards gamma 1.0 Discount factor for future rewards train_epochs 1 Number of training epochs train_max_steps 10000 Maximum number of training steps sample_max_nodes 300000 Maximum number of nodes of one batch to sample num_of_process 5 Number of exploration processes |