Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Evaluation of Trajectory Distribution Predictions with Energy Score
Authors: Novin Shahroudi, Mihkel Lepson, Meelis Kull
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments highlighting the importance of adopting proper scoring rules as a distribution-aware evaluation of trajectory distribution predictions. We empirically demonstrate the consequence of adopting an improper score for evaluation and how it can go wrong in Section 6.1 through a showcase of propriety. We also empirically demonstrate the effect of the trajectory size K in Section 6.2. To see the energy score in action, we perform a real data experiment on the ETH/UCY dataset (Ess et al., 2007) in Section 6.3. |
| Researcher Affiliation | Academia | 1Institute of Computer Science, University of Tartu, Tartu, Tartu County, Estonia. Correspondence to: Novin Shahroudi <EMAIL>, Mihkel Lepson <EMAIL>, Meelis Kull <EMAIL>. |
| Pseudocode | No | The paper provides mathematical definitions and descriptions of metrics but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our experiments is available at https://github.com/novinsh/trajectoryprediction-eval-with-energy-score. |
| Open Datasets | Yes | To see the energy score in action, we perform a real data experiment on the ETH/UCY dataset (Ess et al., 2007) in Section 6.3. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits for its own experiments. It focuses on evaluating pre-trained models on the ETH/UCY dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or cloud configurations) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We set the ground truth parameters to be ยตt = 1, ฯt = 0.2, at =0, and bt =0 for t={1, 2, 3}. Then, we generate N = 5000 observations and consider K ={10, 20, 50, 100, 300} to generate predictions from the same process. |