Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning
Authors: Jiani Huang, Ziyang Li, Binghong Chen, Karan Samel, Mayur Naik, Le Song, Xujie Si
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On synthetic tasks involving mathematical and logical reasoning, Scallop scales significantly better without sacrificing accuracy compared to Deep Prob Log, a principled neural logic programming approach. Scallop also scales to a newly created real-world Visual Question Answering (VQA) benchmark that requires multi-hop reasoning, achieving 84.22% accuracy and outperforming two VQA-tailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively. |
| Researcher Affiliation | Academia | Jiani Huang University of Pennsylvania EMAIL Ziyang Li University of Pennsylvania EMAIL Binghong Chen Georgia Institute of Technology EMAIL Karan Samel Georgia Institute of Technology EMAIL Mayur Naik University of Pennsylvania EMAIL Le Song Georgia Institute of Technology EMAIL Xujie Si Mc Gill University and CIFAR AI Chair, Mila EMAIL |
| Pseudocode | No | The paper describes algorithms and processes in narrative text and figures, but it does not include formally structured pseudocode or algorithm blocks with specific labels such as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | The source code of Scallop is available at https://github.com/scallop-lang/scallop-v1. |
| Open Datasets | Yes | The images and scene graphs are from the GQA [18] dataset and the knowledge graph is from the CRIC [16] dataset. ... Each task takes as input multiple MNIST [20] images and requires performing simple arithmetic (T1-T3) or sorting (T4-T6) over digits depicted in the given images. |
| Dataset Splits | Yes | We split the images randomly into training (60%) validation (10%), and testing (30%) sets. |
| Hardware Specification | Yes | All experiments are conducted on a machine with two 20-core Intel Xeon CPUs, four Ge Force RTX 2080 Ti GPUs, and 768 GB RAM. |
| Software Dependencies | No | The paper mentions software components such as Datalog, Prolog, Sentential Decision Diagram (SDD), Mask RCNN, ResNet, and TransE, but it does not specify version numbers for these or any programming language libraries (e.g., Python, PyTorch, TensorFlow) used in the implementation. |
| Experiment Setup | Yes | Scallop takes 92 hours to finish 15 training epochs with k = 10 and takes only 0.3 seconds on average per training sample. ... In our experimental setup, we apply the binary cross entropy loss function on the two vectors. |