Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

Authors: Chuhao Zhou, Jianfei Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two newly constructed benchmarks show that Holo LLM significantly outperforms existing MLLMs, improving languagegrounded human sensing accuracy by up to 30%.
Researcher Affiliation	Academia	Chuhao Zhou1, Jianfei Yang1 1 MARS Lab, Nanyang Technological University EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical formulations within Section 3 without explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/NTUMARS/Holo LLM
Open Datasets	Yes	We utilize two multimodal human-sensing datasets MM-Fi [11] and XRF55 [10] with generated textual descriptions. ... All datasets used in this paper are publicly available multimodal human sensing datasets, the detailed information can be found in the original papers [11, 10].
Dataset Splits	Yes	We design three experimental settings: (1) Random Split (Random), (2) Cross-Subject Split (Cross Sub), and (3) Cross-Environment Split (Cross Env). Specifically, Random involves a random split of all samples with a ratio of 3:1, and Cross Sub / Cross Env selects samples from nonoverlapping human subjects / environments for training and testing. Detailed statistics for three experimental settings on the sizes of the training and testing sets are summarized in Tab. 4.
Hardware Specification	Yes	For stage one, we utilize the training set of the corresponding experimental setting (Random, Cross Sub, Cross Env) to pretrain the tailored encoders for 120 epochs on a single A100 GPU. ... For stage two, we train the Holo LLM on 2 A100 GPUs for 5 epochs.
Software Dependencies	No	The paper mentions the Adam W optimizer but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup	Yes	The Adam W optimizer with β1 = 0.9, β2 = 0.95, and weight decay of 0.1 is adopted in our training. For stage one, we utilize the training set of the corresponding experimental setting (Random, Cross Sub, Cross Env) to pretrain the tailored encoders for 120 epochs on a single A100 GPU. The learning rate is initialized to 0.1 with a linear warmup strategy for 10 epochs, and then decayed at the 60-th and 100-th epochs with a decay factor of 0.1. For stage two, we train the Holo LLM on 2 A100 GPUs for 5 epochs. We set the accumulated iterations to 4 and form an effective batch size of 64 / 48 for the MM-Fi / XRF55 datasets, respectively. Followed by One LLM [30], the linear warmup strategy is utilized for the first 2K iterations with a maximum learning rate of 2e-5.