reproducibilityindex.ai

Interpretable Neural-Symbolic Concept Reasoning

Authors: Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Mateo Espinosa Zarlenga, Lucie Charlotte Magister, Alberto Tonda, Pietro Lio, Frederic Precioso, Mateja Jamnik, Giuseppe Marra

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance. (From Abstract) and 4. Experiments section.
Researcher Affiliation	Academia	1University of Cambridge, Cambridge, UK 2Universit e Cˆote d Azur, Inria, CNRS, I3S, Maasai, Nice, France 3University of Siena, Siena, Italy 4INRA, Universit e Paris Saclay, Thiverval-Grignon, France 5KU Leuven, Leuven, Belgium.
Pseudocode	No	The paper describes methods in narrative text and equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code available in public repository: https://github. com/pietrobarbiero/pytorch_explain.
Open Datasets	Yes	To test the DCR s ability to re-discover groundtruth rules we use the MNIST-Addition dataset (Manhaeve et al., 2018)... Furthermore, we evaluate our methods on two real-world benchmark datasets: the Large-scale Celeb Faces Attributes (Celeb A, (Liu et al., 2015)) and the Mutagenicity (Morris et al., 2020) dataset. ... We use the version available as part of the Py Torch Geometric (Fey & Lenssen, 2019) library.
Dataset Splits	Yes	In all synthetic tasks, we generate datasets with 3,000 samples and use a traditional 70%10%-20% random split for training, validation, and testing datasets, respectively.
Hardware Specification	Yes	All of our experiments were run on a private machine with 8 Intel(R) Xeon(R) Gold 5218 CPUs (2.30GHz), 64GB of RAM, and 2 Quadro RTX 8000 Nvidia GPUs.
Software Dependencies	Yes	For our experiments, we implemented all baselines and methods in Python 3.7 and relied upon open-source libraries such as Py Torch 1.11 (Paszke et al., 2019) (BSD license) and Scikit-learn (Pedregosa et al., 2011) (BSD license). To produce the plots seen in this paper, we made use of Matplotlib 3.5 (BSD license).
Experiment Setup	Yes	For all datasets we train DCR using a Godel t-norm semantics. We also implement the neural modules ϕ and ψ as with two-layer MLPs with a number of hidden layers given by the size of the concept embeddings. For all synthetic datasets (i.e., XOR, Trig, Dot) and for Celeb A we train DCR for 3000 epochs using a temperature of τ = 100. In Mutagenicity we train DCR for 7000 epochs using a temperature of 100. and All models in this task are trained for 200 epochs using a batch size of 512 and an SGD optimizer with 0.9 momentum and learning rate of 5 10 3. (Section B.2)