Convolutional Differentiable Logic Gate Networks

Authors: Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On CIFAR-10, we achieve an accuracy of 86.29% using only 61 million logic gates, which improves over the SOTA while being 29 smaller.Table 1: Main results for the CIFAR-10 experiments. Our Logic Tree Net models reduce the required numbers of logic gates by factors of 29 compared to the state-of-the-art models. Our models are scaled to match accuracies.
Researcher Affiliation Collaboration Felix Petersen Stanford University Infty Labs Research mail@felix-petersen.de Hilde Kuehne Tuebingen AI Center MIT-IBM Watson AI Lab h.kuehne@uni-tuebingen.de Christian Borgelt University of Salzburg christian@borgelt.net Julian Welzel Infty Labs Research welzel@inftylabs.com Stefano Ermon Stanford University ermon@cs.stanford.edu
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes We will make the code publicly available by including it into the difflogic library at github.com/Felix-Petersen/difflogic.
Open Datasets Yes We train five sizes of Logic Tree Nets on the CIFAR-10 data set [9] using the Adam W optimizer [12], [33] with a batch size of 128 at a learning rate of 0.02. We continue our evaluation on MNIST [8].
Dataset Splits Yes For CIFAR-10, we split the training data into 45 000 training images and 5 000 validation images, and evaluate every 2 000 steps to select the best model. For MNIST, we split the training data into 50 000 training images and 10 000 validation images, and evaluate every 5 000 steps to select the best model.
Hardware Specification Yes The speed of our convolutional layer is up to 200 faster per logic gate than existing randomly connected LGN implementations [7]. We will make the code publicly available by including it into the difflogic library at github.com/Felix-Petersen/difflogic. We developed efficient fully-fused low-level CUDA kernels, which, for the first time, enable training of convolutional LGNs. The typical training time per epoch for the L model on a single NVIDIA RTX 4090 GPU is 30 seconds. On CIFAR-10 we limit the hardware development up to the base model (B) due to labor cost. In Table 2, we report the results. We can observe a very favorable FPGA timing trade-off compared to previous works. Indeed, using our model (B) we achieve 80.17% accuracy, matching the accuracy of the FINN accelerator, but decreasing inference time from 45.6 µs to 24 ns. On MNIST, our model improves accuracy while achieving 160 faster inference speed, and on CIFAR-10, our model improves inference speed by 1900 over the state-of-the-art. We illustrate the placement of our MNIST model (M) on a Xilinx XC7Z045 FPGA in Figure 9.
Software Dependencies No The paper mentions PyTorch [44] but does not specify a version number for it or other software dependencies.
Experiment Setup Yes We train five sizes of Logic Tree Nets on the CIFAR-10 data set [9] using the Adam W optimizer [12], [33] with a batch size of 128 at a learning rate of 0.02. Additional training details and hyperparameters are in Appendix A.2. In Table 6, we summarize the hyperparameters for each model architecture configuration. For the loss, we use different softmax temperatures τ depending on the model size. We observe that the hyperparameter that depends the most on the data set is the learning rate η. The temperature τ and thus the range of attainable outputs nℓℓ/c/τ has a minor dependence on the data set. We use weight decay only for the CIFAR-10 models as it does not yield advantages for the smaller MNIST models. We note that convergence, when training with weight decay, is generally slightly slower but leads to slightly better models.