Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Contextual Tokenization for Graph Inverted Indices

Authors: Pritish Chakraborty, Indradyumna Roy, Soumen Chakrabarti, Abir De

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that CORGII provides better trade-offs between accuracy and efficiency, compared to several baselines. Code is in: https://github.com/structlearning/corgii. [...] 4 Experiments We assess the effectiveness of CORGII against several baselines on real-world graph datasets and analyze the effect of different components of CORGII.
Researcher Affiliation	Academia	Pritish Chakraborty IIT Bombay Indradyumna Roy IIT Bombay Soumen Chakrabarti IIT Bombay Abir De IIT Bombay Emails: EMAIL
Pseudocode	Yes	Figure 2: (a) preprocessing and (b) query-time components of CORGII. 1: input: graph corpus C, training queries {Gq} with relevance labels {yqc} 2: Train GTNet 3: for each query-corpus pair (Gq, Gc) do 4: Compute Zq = GTNet(Gq) (Eq. (2)) 5: Compute Zc = GTNet(Gc) (Eq. (3)) 6: Compute Chamfer(Gq, Gc) (Eq. (4)) 7: Train GTNet by minimizing margin-based ranking loss on Chamfer(Gq, Gc) (Eq. (5)) 8: Train impact network 9: for each query-corpus pair (Gq, Gc) do 10: Compute Zq = GTNet(Gq) 11: Compute binary embeddings 12: {bzq(u)} = JZq > 0.5K 13: compute impact scores of all query graphs 14: compute Simpact(Gq, Gc) (Eq. (9)) 15: Train Impactψ network by minimizing marginbased ranking loss on Simpact(Gq, Gc) (Eq. (10))
Open Source Code	Yes	Code is in: https://github.com/structlearning/corgii.
Open Datasets	Yes	We evaluate CORGII on four datasets from the TU benchmark suite [41]: PTC-FR, PTC-FM, COX2, and PTC-MR, which are also used existing works on graph matching [9, 10]. ... [41] Christopher Morris, Nils M Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. ar Xiv preprint ar Xiv:2007.08663, 2020.
Dataset Splits	Yes	We split Q in 60:20:20 train (Qtrain), dev (Qdev) and test (Qtest) folds. For each query in Qtest, we retrieve the corpus graphs Rq that are marked relevant by the corresponding model. [...] The queryset is split such that \|Qtrain\| = 300, \|Qdev\| = 100 and \|Qtest\| = 100.
Hardware Specification	Yes	All experiments were conducted on an in-house NAS server equipped with seven 48GB RTX A6000 GPUs respectively. All model training is done on GPU memory. Further, the server is equipped with 96-core CPU and a maximum storage of 20TB, and runs Debian v6.1.
Software Dependencies	No	All experiments are run with a fixed random seed of 42 across libraries and frameworks. We leverage Py Torch s deterministic execution setting and Cu BLAS workspace configuration to ensure reproducible execution.
Experiment Setup	Yes	Optimization and Early Stopping. We train both models using the Adam optimizer with a learning rate of 1 x 10^-3 and a batch size of 3000. During GTNet training, early stopping is performed at the sub-epoch level (i.e., across batches) with a patience of 30 steps and validation every 30 steps. For Impactψ, early stopping is applied at the epoch level with a maximum of 20,000 epochs and patience set to 50. Validation is conducted every epoch, with a default tolerance threshold of 5 x 10^-3. [...] Margin Hyperparameter Tuning. For the Chamfer-based ranking loss in Eq. (5), we experiment with margin values of {0.01, 0.1, 1.0, 10, 30}. The best-performing margins are 10 for PTC-FR and PTC-FM, and 30 for COX2 and PTC-MR. For the impact network loss in Eq. (10), tested margins include {0.01, 0.1, 1.0}. Margins of 0.01, 0.01, 1.0, and 0.1 work best for PTC-FR, PTC-FM, COX2, and PTC-MR, respectively.