Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

Authors: Zelin Peng, Zhengqin Xu, Qingyang Liu, Xiaokang Yang, Wei Shen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments across multiple MLLM benchmarks demonstrate that Hyper ET consistently improves both existing pre-training and fine-tuning MLLMs clearly with less than 1% additional parameters.
Researcher Affiliation	Academia	1 Mo E Key Lab of Artificial Intelligence, AI Institute, School of Computer Science, SJTU 2 State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, CAS
Pseudocode	No	The paper describes the methodology and theoretical analysis in sections 3 and 4, and includes mathematical derivations and proofs, but does not present a structured pseudocode or algorithm block.
Open Source Code	No	Answer: [No] Justification: The code will be made public after acceptance.
Open Datasets	Yes	We evaluate our method on Science QA [42], a challenging large-scale VQA benchmark... Our experimental framework evaluates the proposed method against stateof-the-art pre-trained MLLMs across 12 standard visual language benchmarks, including VQAv2 [21], GQA [25], VW: Vis Wi Z [22], SQA: Science QA-IMG [42], TVQA: Text VQA [53], PE: POPE [37], ME: MME [67], MB: MMBench [41], MBCN: MMBench-Chinese [41], SD: SEEDBench [34], LVAW: LLa VA-Bench (In-the-Wild) [40] and M-Vet [68].
Dataset Splits	Yes	We evaluate our method on Science QA [42], a challenging large-scale VQA benchmark... Our experimental framework evaluates the proposed method against stateof-the-art pre-trained MLLMs across 12 standard visual language benchmarks.
Hardware Specification	Yes	All experiments are conducted using a maximum of 8 NVIDIA H800 GPUs.
Software Dependencies	No	The paper mentions using specific models and frameworks like LLa MA-7B, LLa MA-13B, Vicuna-7B, Vicuna-13B, and CLIP, but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We follow the experiment setting in [43]. All models utilize a CLIP pre-trained Vi T-L/14 visual encoder. The weights of Hyper ET in this task are implemented using the three parameter-efficient scaling matrices, i.e., WD s , WB D s and WB s , and are adapted in the attention layer, consistent with most parameter-efficient tuning methods, e.g., Lo RA [24]. The curvature c is 0.01.