Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

3DPGS: 3D Probabilistic Graph Search for Archaeological Piece Grouping

Authors: Junfeng Cheng, Yingkai Yang, Tania Stathaki

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a new benchmark called Archaeological Piece Grouping. ... We propose a new framework called 3D Probabilistic Graph Search (3DPGS) to address the problem of grouping mixed archaeological pieces. ... Our framework significantly outperforms other baselines. ... We demonstrate the quantitative results in Table 2. The results show that our algorithm outperforms other baselines by a large margin, particularly on the metric of GL-F1. ... Our ablation study contains two parts. The first part (Table 3) is the architecture ablation, which tests the influences of different important modules. The second part (Fig. 5) is the research on the proposed PGS algorithm.
Researcher Affiliation	Academia	Junfeng Cheng, Yingkai Yang, Tania Stathaki Department of Electrical and Electronic Engineering Imperial College London London, UK, SW7 2AZ EMAIL
Pseudocode	Yes	Algorithm 1: Probabilistic Graph Search (PGS) Input: Probabilistic Matching Graph MAT G, Probability Threshold TP Output: Predicted Groups ˆG = { ˆGk}K k=1
Open Source Code	Yes	Code https://github.com/J-F-Cheng/3dpgs-grouping
Open Datasets	Yes	For our Arc Pie dataset, the original mesh files were collected from Scan The World (2024), the world s largest repository of free, 3D printable cultural objects, created from global 3D scan data. ... Additionally, we also use the same mixing method discussed here to create the mixed Breaking-Bad Artifact dataset Sell an et al. (2022).
Dataset Splits	Yes	To evaluate the generalization ability of algorithms across various types of archaeological objects, we further divide the collected data into Seen and Unseen groups based on their categories. The Seen objects are used in both training and testing, while the Unseen objects are used solely in testing. ... Table 1: The statistics of our Arc Pie dataset. We separate our shapes into Seen and Unseen categories. Seen categories are shapes used in both training and testing, while Unseen are the shapes that are only used for testing.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It mentions simulation methods but no specific hardware for the proposed framework's execution or training.
Software Dependencies	No	The paper mentions several deep learning architectures and techniques like Point Net (Qi et al. 2017a) and Edge Conv-based GNN (Wang et al. 2019), but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup	Yes	In the training procedure, we use Mean Squared Error (MSE) as our loss function. ... To determine whether pieces have sufficient affinity to be considered in one group, we use a parameter called the Probability Threshold (TP ). ... The ablation study in Fig. 5 shows that TP has a significant impact on the performance of 3DPGS. When TP is set to 0.65 or 0.75, 3DPGS achieves relatively high performance across different metrics. We apply 0.75 in our experiments.