Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Predicting Any Human Trajectory In Context

Authors: Ryo Fujii, Hideo Saito, Ryo Hachiuma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Traj ICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks.
Researcher Affiliation	Collaboration	Ryo Fujii1,2 Hideo Saito1,2 Ryo Hachiuma3 1Keio University 2Keio AI Research Center 3NVIDIA
Pseudocode	No	The paper describes the model architecture and methods using text and mathematical formulations but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Justification: we will open source the code and models.
Open Datasets	Yes	We train our model using the MOTSynth [14] dataset, a synthetic pedestrian detection and tracking dataset... In addition to in-domain evaluation on MOTSynth, we assess our method on five widely adopted datasets for cross-domain evaluation: JRDB [45] (in both image coordinates, JRDB-Image, and world coordinates, JRDB-World), Wild Track [8], SDD [53], and JTA [15].
Dataset Splits	Yes	For our experiments, we use a subset of 424 scenes for training and 107 scenes for evaluation, all captured with a static camera. ... The earliest 80% of identities are used to construct the example pool, while the remaining 20% are reserved for evaluation.
Hardware Specification	Yes	We set the batch size to 16 and train the model using one NVIDIA RTX A6000 GPU. ... The inference cost was computed on a machine with an Intel Xeon W-3235 CPU, 128GB of RAM, and an NVIDIA Titan RTX GPU, with GPU memory measured using a batch size of one.
Software Dependencies	No	The paper mentions using the Adam W optimizer [40], cosine annealing scheduler [39], and a Transformer encoder [61], but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version, CUDA version).
Experiment Setup	Yes	In the first stage, we train the model using the Adam W optimizer [40] with a base learning rate of 1 10 3 for 100 epochs. We perform a 3-epoch warmup and decay the learning rate to 0 throughout training using the cosine annealing scheduler [39]. In the second stage, we train the model for 400 epochs, with a 12-epoch warmup and the cosine annealing scheduler, following the same setup as in the first stage. We set M (number of in-context examples) to eight. ... We set the batch size to 16... The model configuration for Predictor consists of three layers and four attention heads, with a model dimension of d = 128. We employ Leaky ReLU functions as the activation function.