reproducibilityindex.ai

Learning with Logical Constraints but without Shortcut Satisfaction

Authors: Zenan Li, Zehua Liu, Yuan Yao, Jingwei Xu, Taolue Chen, Xiaoxing Ma, Jian L\"{u}

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carry out experiments on four tasks, i.e., handwritten digit recognition, handwritten formula recognition, shortest distance prediction in a weighted graph, and CIFAR100 image classification. For each task, we train the model with normal cross-entropy loss on the labeled data as the baseline result, and compare our approach with PD (Nandwani et al., 2019) and DL2 (Fischer et al., 2019), which are the state-of-the-art approaches that incorporate logical constraints into the trained models.
Researcher Affiliation	Academia	Zenan Li1, Zehua Liu2, Yuan Yao1, Jingwei Xu1, Taolue Chen3, Xiaoxing Ma1, Jian Lü1 1State Key Lab of Novel Software Technology, Nanjing University, China 2Department of Mathematics, The University of Hong Kong, Hong Kong 3Department of Computer Science, Birkbeck, University of London, UK
Pseudocode	Yes	F THE ALGORITHM OF LOGICAL TRAINING... Algorithm 1 Logical Training Procedure Initialize: w0 randomly; τ 0 and τ 0 uniformly; δ0 = 1. for t = 0, 1, . . . , do Draw a collection of i.i.d. data samples {(xi, yi)}N i=1. wt+1 wt ηw w L(w, δ; τ , τ ). (PN i=1 µt i)/N. τ t+1 τ t + η τ L(w, δ; τ , τ ). τ t+1 τ t η τ L(w, δ; τ , τ ). end for
Open Source Code	Yes	The code, together with the experimental data, is available at https://github.com/SoftWiser-group/NeSy-without-Shortcuts.
Open Datasets	Yes	In the first experiment, we construct a semi-supervised classification task by removing the labels of 6 in the MNIST dataset (Le Cun et al., 1989) during training. We then apply a logical rule to predict label 6 using the rotation relation between 6 and 9 as f( x) = 9 f(x) = 6, where x is the input, and x denotes the result of rotating x by 180 degrees. We rewrite the above rule as the disjunction (f( x) = 9) (f(x) = 6). We train the Le Net model on the MNIST dataset, and further validate the transferability performance of the model on the USPS dataset (Hull, 1994).
Dataset Splits	No	The paper describes using labeled and unlabeled data for training (e.g., '10,000 examples and an unlabeled dataset of 30,000 examples by randomly sampling from the original training data' for CIFAR100, or '2% labeled data and 80% unlabeled data' for HWF), and mentions validating transferability on USPS, but it does not consistently provide explicit, distinct training/validation/test dataset splits with percentages or counts for all experiments that are needed to reproduce the exact data partitioning.
Hardware Specification	Yes	The experiments were conducted on a GPU server with two Intel Xeon Gold 5118 CPU@2.30GHz, 400GB RAM, and 9 Ge Force RTX 2080 Ti GPUs. The server ran Ubuntu 16.04 with GNU/Linux kernel 4.4.0.
Software Dependencies	No	We implemented our approach via the Py Torch DL framework. For PD and DL2, we use the code provided by the respective authors. ... The server ran Ubuntu 16.04 with GNU/Linux kernel 4.4.0. The text mentions software but not specific version numbers for PyTorch or other libraries.
Experiment Setup	Yes	For this experiment, we used the Le Net-5 architecture, set the batch size to 128, the number of epochs to 60. For the baseline, DL2, and our approach, we optimized the loss using Adam optimizer with learning rate 1e-3.