Injecting Logical Constraints into Neural Networks via Straight-Through Estimators
Authors: Zhun Yang, Joohyung Lee, Chiyoun Park
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints. |
| Researcher Affiliation | Collaboration | 1 School of Computing and Augmented Intelligence, Fulton Schools of Engineering, Arizona State University, Tempe, AZ, USA 2 Samsung Research, Samsung Electronics Co., Seoul, South Korea. |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual explanations, but it does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | The implementation of our method is publicly available online at https://github.com/azreasoners/ cl-ste. |
| Open Datasets | Yes | We introduced the CNF encoding and the loss function for the mnist Add problem in Example 4.1. The problem was used in (Manhaeve et al., 2018) and (Yang et al., 2020) as a benchmark. ... The following are benchmark problems from (Tsamoura et al., 2021). ... We use a typical CNF for 9 × 9 Sudoku problem. ... The dataset in the CNN experiments consists of 70k data instances... We consider the Recurrent Relational Network (RRN) (Palm et al., 2018)... The training dataset in (Palm et al., 2018) contains 180k data instances... We use the dataset from (Xu et al., 2018)... For both tasks, we apply b(x)+i STE to the same MLP (without softmax in the last layer) as in (Xu et al., 2018), i.e., an MLP of shape (784, 1000, 500, 250, 250, 250, 10), where the output x ∈ R10 denotes the digit/cloth prediction. |
| Dataset Splits | No | The paper primarily discusses training and testing splits, for example, '80%/20% for training/testing' for the Sudoku dataset and '80/20 train/test examples' for the shortest path problem. It does not explicitly mention or detail a separate validation set split. |
| Hardware Specification | Yes | All experiments in this section were done on Ubuntu 18.04.2 LTS with two 10-cores CPU Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz and four GP104 [Ge Force GTX 1080]. |
| Software Dependencies | No | The paper mentions 'Ubuntu 18.04.2 LTS' as the operating system and 'Py Torch aggregate functions' in a footnote, but it does not specify version numbers for any software libraries or dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | CL-STE with a batch size of 16 (denoted by CL-STE(16)). ... The total loss function used for mnist Add problem is L = Lcnf(C, v, f) + X x {x1,x2} 0.1 Lbound(x). ... The total loss function L we used is L = Lcnf(C, v, f) + 0.1 Lbound(x). ... RRN takes q as input and, after 32 iterations of message passing... trained for 100 epochs... trained for 60 epochs... We run experiments for 50k batch updates with a batch size of 32. |