Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant

Authors: Jonas Spinner, Luigi Favaro, Peter Lippmann, Sebastian Pitz, Gerrit Gerhartz, Tilman Plehn, Fred A. Hamprecht

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In several experiments, we demonstrate the efficacy of exact Lorentz equivariance by achieving state-of-the-art or competitive results using a powerful Lorentz-equivariant transformer and by making several domain-specific networks Lorentz-equivariant. We now demonstrate the effectiveness of Lorentz Local Canonicalization (LLo Ca) for a range of different architectures on two relevant tasks in HEP. We start from the classification of jets, or jet tagging . Then, we present extensive studies on QFT amplitude regression.
Researcher Affiliation	Academia	1ITP, Heidelberg University, Germany, 2CP3, UCLouvain, Belgium, 3IWR, Heidelberg University, Germany EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Local reference frames via polar decomposition Require: v0, v1, v2 R4 with v i = Λvi, v0, v0 > 0 Ensure: LT g L = g and L L = LΛ 1 1: B B(v0) using Eq. (2) 2: wk Bvk for k = 1, 2 3: u1, u2, u3 Gram Schmidt( w1, w2) 4: R ( u1, u2, u3)T 5: R 1 0T 0 R
Open Source Code	Yes	Our implementation of LLo Ca is publicly available on https://github.com/heidelberg-hepml/lloca, and experiments can be reproduced with https://github.com/heidelberg-hepml/lloca-experiments.
Open Datasets	Yes	The amplitude regression datasets are publicly available on https://zenodo.org/ records/16793011. We use the Jet Class tagging dataset of Qu et al. [38].6 The data samples are organized as point clouds, simulated in the CMS experiment environment at detector level. For details on the simulation process, see Qu et al. [38].6 Available at https://zenodo.org/records/6619768 under a CC-BY 4.0 license.
Dataset Splits	Yes	The Z + {1, 2, 3}g datasets contain 10M events each, while we generate 100M events for the more challenging Z + 4g dataset to complete our scaling studies. For validation and testing, we use the same independent dataset as in [42], from which we use 100k events for validation and 500k events for the final evaluation. ... The dataset comprises 10 equally represented classes and is split into 100 million events for training, 20 million for testing, and 5 million for validation.
Hardware Specification	Yes	The models are trained on a single A100 GPU. Training times are measured within our code environment on an H100 GPU, excluding overhead from validation and testing.
Software Dependencies	No	We use Py Torchs Reduce LROn Plateau learning rate scheduler to decrease the learning rate by a factor 0.3 if no improvements in the validation loss are observed in the last 20 validations. We use Py Torch s torch.utils.flop_counter.Flop Counter Mode.
Experiment Setup	Yes	All models are trained with a batch size of 1024 and using the Adam optimizer with β = [0.99, 0.999] to optimize the network weights for 2 105 iterations. We use Py Torchs Reduce LROn Plateau learning rate scheduler to decrease the learning rate by a factor 0.3 if no improvements in the validation loss are observed in the last 20 validations. We validate our models every 103 iterations; if 103 iterations are less than 50 epochs then we validate every 50 epochs. ... For both LLo Ca-Par T and LLo Ca-Particle Net, we adopt the same training hyperparameters as the official implementations, without any additional tuning. Specifically, models are trained for 1,000,000 iterations with a batch size of 512, using the Ranger optimizer as implemented in the official Par T and Particle Net repositories. A constant learning rate is applied for the first 700,000 iterations, followed by exponential decay. We use a learning rate of 0.001 for all networks. Our Transformer and LLo Ca-Transformer models are trained with the same number of iterations, batch size, and initial learning rate as LLo Ca-Par T, but use the Adam W optimizer in conjunction with a cosine annealing learning rate schedule.