Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Xeggora: Exploiting Immune-to-Evidence Symmetries with Full Aggregation in Statistical Relational Models

Authors: Mohammad Mahdi Amirian, Saeed Shiry Ghidary

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section is composed of two main evaluation reports. In order to design efficient heuristics for FCA as described earlier, the ILP resolution phase was empirically analyzed and evaluated on the artificial data. Furthermore, the whole algorithm was experimented on real benchmark MLNs.
Researcher Affiliation Academia Mohammad Mahdi Amirian EMAIL Computer Engineering & Information Technology Department Amirkabir University of Technology, Tehran, Iran Saeed Shiry Ghidary EMAIL Math & Computer Science Department Amirkabir University of Technology, Tehran, Iran
Pseudocode Yes Algorithm CHOOSEFORAGGREGATION Input β„±: a first-order disjunctive clause Input 𝒒: a SQL table, containing all essential ground clauses of β„± Output appropriate clustering scheme as the target for aggregation 1: L πΏπ‘–π‘‘π‘’π‘Ÿπ‘Žπ‘™π‘ _π‘œπ‘“ (β„±) 2: 𝒱 π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘π‘™π‘’π‘ _π‘œπ‘“ (L) 3: if |L| = 1 4: candidate_sets { } 5: else 6: candidate_sets {} 7: foreach non-empty 𝑉 𝒱 8: identical_literals {ℓ𝑖 𝐿| π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘π‘™π‘’π‘ _π‘œπ‘“ (ℓ𝑖) 𝑉} 9: if identical_literals candidate_sets and identical_literals L 10: add identical_literals to candidate_sets 11: if candidate_sets contains any sets of literals with the size of |L| 1 12: return best of them greedily to be further first-order aggregated 13: else 14: query SELECT 15: foreach identical_part candidate_sets 16: query query + COUNT (DISTINCT + π‘†π‘’π‘Ÿπ‘–π‘Žπ‘™π‘–π‘§π‘’ (identical_part) + ) as + πΆπ‘Žπ‘π‘‘π‘–π‘œπ‘› (identical_part) [+ , ] // except for the last loop 17: query query + FROM + 𝒒 18: execute query into #clusters 19: best_candidate_sets {argmin#π‘π‘™π‘’π‘ π‘‘π‘’π‘Ÿπ‘  candidate_sets} 20: return argmaxcardinality best_candidate_sets
Open Source Code Yes We release the code as an open source project for further investigation5. The source code is available at https://github.com/amirian/xeggora.
Open Datasets Yes RC was built for the classification problem on the CORA (Mc Callum, Nigam, Rennie, & Seymore, 2000) dataset. LP performs prediction of the relations holding between UW-CSE students, faculty, and staff (Richardson & Domingos, 2006). IE (Poon & Domingos, 2007) extracts database records from parsed sources. PR contains information on the yeast protein location, function, class, phenotype, and enzymes, from the MIPS (Munich Information center for Protein Sequence) Comprehensive Yeast Genome Database, as of February 2005 (Mewes et al., 2000). ER is used to find records corresponding to the same real-world entity (Singla & Domingos, 2006).
Dataset Splits No The paper mentions several benchmark datasets (CORA, UW-CSE, MIPS, EKAW, etc.) and their characteristics in Table 1, such as '# evidence atoms' and '# clauses'. However, it does not explicitly provide details about how these datasets were partitioned into training, validation, or test sets (e.g., specific percentages, absolute counts, or citations to predefined splits for reproducibility).
Hardware Specification Yes All experiments were performed on a PC with 8 GB RAM and 4 cores with 2.1 GHz.
Software Dependencies Yes In both evaluations, Gurobi4 version 8 was employed as the ILP solver to find an exact or approximate solution based on a gap parameter (bound of relative error).
Experiment Setup Yes The gap was set to 10^-6 to reach the exact solution in the experiments it is reachable. For each benchmark with intractable exact inference, we tried various gap ranges to find the best approximation in admissible time. All experiments were performed on a PC with 8 GB RAM and 4 cores with 2.1 GHz. In addition to the solver s gap bound parameter, we set its time limit to stop optimization if the gap is not reached in 10 minutes.