Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints
Authors: Weidi Xu, Jingwei Wang, Lele Xie, Jianshan He, Hongting Zhou, Taifeng Wang, Xiaopei Wan, Jingdong Chen, Chao Qu, Wei Chu
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in three kinds of tasks over images, graphs, and text show that Logic MP outperforms advanced competitors in both performance and efficiency. |
| Researcher Affiliation | Industry | 1INFLY TECH (Shanghai) Co., Ltd 2Ant Group 3Bio Map Research |
| Pseudocode | Yes | Algorithm 1 Logic MP. and Algorithm 2 Py Torch-like Code for Logic MP with Transitivity Rule |
| Open Source Code | Yes | Corresponding author. Code is available at: https://github.com/wead-hsu/logicmp |
| Open Datasets | Yes | We evaluate Logic MP on a real-world document understanding benchmark dataset (FUNSD) (Jaume et al., 2019) with up to 262K mutually-dependent variables and show that it outperforms previous state-of-the-art methods (Sec. 5.1). For the second task (Sec. 5.2), we conduct experiments on relatively large datasets in the MLN literature, including UW-CSE (Richardson & Domingos, 2006) and Cora (Singla & Domingos, 2005). Finally, we evaluate Logic MP on a sequence labeling task (Co NLL-2003) (Sang & Meulder, 2003) and show that it can leverage task-specific rules to improve performance over competitors (Sec. 5.3). |
| Dataset Splits | No | The paper states '149 training samples and 50 test samples' for FUNSD, but does not explicitly mention a validation split or provide specific percentages/counts for validation sets across other datasets. |
| Hardware Specification | Yes | The experiments were conducted on a basic machine with a 132GB V100 GPU and an Intel E5-2682 v4 CPU at 2.50GHz with 32GB RAM. The experiments were conducted on a basic machine with a 16GB P100 GPU and an Intel E5-2682 v4 CPU at 2.50GHz with 32GB RAM. |
| Software Dependencies | No | The paper mentions software like 'Num Py', 'Py Torch', 'Tensor Flow', and 'Py SDD (Darwiche, 2011)', but does not provide specific version numbers for any of these to ensure reproducibility. |
| Experiment Setup | Yes | The model is trained with the Adam optimizer with a learning rate of 5e-4. |