Learning with Logical Constraints but without Shortcut Satisfaction
Authors: Zenan Li, Zehua Liu, Yuan Yao, Jingwei Xu, Taolue Chen, Xiaoxing Ma, Jian L\"{u}
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We carry out experiments on four tasks, i.e., handwritten digit recognition, handwritten formula recognition, shortest distance prediction in a weighted graph, and CIFAR100 image classification. For each task, we train the model with normal cross-entropy loss on the labeled data as the baseline result, and compare our approach with PD (Nandwani et al., 2019) and DL2 (Fischer et al., 2019), which are the state-of-the-art approaches that incorporate logical constraints into the trained models. |
| Researcher Affiliation | Academia | Zenan Li1, Zehua Liu2, Yuan Yao1, Jingwei Xu1, Taolue Chen3, Xiaoxing Ma1, Jian Lü1 1State Key Lab of Novel Software Technology, Nanjing University, China 2Department of Mathematics, The University of Hong Kong, Hong Kong 3Department of Computer Science, Birkbeck, University of London, UK |
| Pseudocode | Yes | F THE ALGORITHM OF LOGICAL TRAINING... Algorithm 1 Logical Training Procedure Initialize: w0 randomly; τ 0 and τ 0 uniformly; δ0 = 1. for t = 0, 1, . . . , do Draw a collection of i.i.d. data samples {(xi, yi)}N i=1. wt+1 wt ηw w L(w, δ; τ , τ ). (PN i=1 µt i)/N. τ t+1 τ t + η τ L(w, δ; τ , τ ). τ t+1 τ t η τ L(w, δ; τ , τ ). end for |
| Open Source Code | Yes | The code, together with the experimental data, is available at https://github.com/SoftWiser-group/NeSy-without-Shortcuts. |
| Open Datasets | Yes | In the first experiment, we construct a semi-supervised classification task by removing the labels of 6 in the MNIST dataset (Le Cun et al., 1989) during training. We then apply a logical rule to predict label 6 using the rotation relation between 6 and 9 as f( x) = 9 f(x) = 6, where x is the input, and x denotes the result of rotating x by 180 degrees. We rewrite the above rule as the disjunction (f( x) = 9) (f(x) = 6). We train the Le Net model on the MNIST dataset, and further validate the transferability performance of the model on the USPS dataset (Hull, 1994). |
| Dataset Splits | No | The paper describes using labeled and unlabeled data for training (e.g., '10,000 examples and an unlabeled dataset of 30,000 examples by randomly sampling from the original training data' for CIFAR100, or '2% labeled data and 80% unlabeled data' for HWF), and mentions validating transferability on USPS, but it does not consistently provide explicit, distinct training/validation/test dataset splits with percentages or counts for all experiments that are needed to reproduce the exact data partitioning. |
| Hardware Specification | Yes | The experiments were conducted on a GPU server with two Intel Xeon Gold 5118 CPU@2.30GHz, 400GB RAM, and 9 Ge Force RTX 2080 Ti GPUs. The server ran Ubuntu 16.04 with GNU/Linux kernel 4.4.0. |
| Software Dependencies | No | We implemented our approach via the Py Torch DL framework. For PD and DL2, we use the code provided by the respective authors. ... The server ran Ubuntu 16.04 with GNU/Linux kernel 4.4.0. The text mentions software but not specific version numbers for PyTorch or other libraries. |
| Experiment Setup | Yes | For this experiment, we used the Le Net-5 architecture, set the batch size to 128, the number of epochs to 60. For the baseline, DL2, and our approach, we optimized the loss using Adam optimizer with learning rate 1e-3. |