Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Authors: Haoyu He, Haozheng Luo, Yan Chen, Qi (Cheems) Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2.4% improvement in overall accuracy, a 5.0% increase on weekends, and a 24.6% reduction in training time. |
| Researcher Affiliation | Academia | Northeastern University Northwestern University EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 RHYTHM Overall Pipeline |
| Open Source Code | Yes | Code is publicly available at https://github.com/he-h/rhythm. |
| Open Datasets | Yes | We evaluate our approach on three real-world datasets collected from the cities of Kumamoto, Sapporo, and Hiroshima sourced from YJMob100K [74]. |
| Dataset Splits | Yes | Each dataset is divided into training, validation, and test sets based on days, with 70%, 20%, and 10% of the data allocated to each set, respectively. |
| Hardware Specification | Yes | We perform all experiments using a single NVIDIA A100 GPU with 40GB of memory and a 24-core Intel(R) Xeon(R) Gold 6338 CPU operating at 2.00GHz. |
| Software Dependencies | No | Our code is developed in Py Torch [52] and utilizes the Hugging Face Transformer Library2 for experimental execution. |
| Experiment Setup | Yes | Embeddings for time-of-day and day-of-week, the categorical location embedding, and the coordinate projection all use hidden dimensions of 128, 128, 256, and 128, respectively. We use Adam W [41] as the optimizer. For model training, we conduct a systematic hyperparameter search, exploring learning rates from the set {1e-4, 3e-4, 5e-4} and weight decay values from {0, 0.001, 0.01}. Through extensive validation experiments, we determine the optimal configuration for each dataset. All models are trained with a consistent batch size of 64 across all datasets for fair comparison. The final hyperparameter settings are selected based on performance on the validation set. |