Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models

Authors: Qiushi Huang, Tom Ko, Zhan ZHUANG, Lilian Tang, Yu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, Hi RA outperforms Lo RA and its variants on several tasks, with extensive ablation studies validating its effectiveness. Our code is available at https://github.com/hqsiswiliam/hira. In this section, we conduct experiments on three tasks to evaluate the proposed Hi RA method.
Researcher Affiliation	Collaboration	1Southern University of Science and Technology, 2University of Surrey 3Byte Dance, 4City University of Hong Kong
Pseudocode	No	The paper describes the methodology using mathematical formulations and textual explanations, but it does not include a dedicated section or figure for pseudocode or an algorithm.
Open Source Code	Yes	Our code is available at https://github.com/hqsiswiliam/hira.
Open Datasets	Yes	Commonsense Reasoning. We utilize eight sub-tasks with predefined training and testing datasets (Hu et al., 2023)1, combining 170,420 query-answer pairs... The sub-tasks include Bool Q (Clark et al., 2019)... PIQA (Bisk et al., 2020)... SIQA (Sap et al., 2019)... Hella Swag (Zellers et al., 2019)... Wino Grande (Sakaguchi et al., 2021)... ARC-c and ARC-e (Clark et al., 2018)... and OBQA (Mihaylov et al., 2018)... 1https://github.com/AGI-Edgerunners/LLM-Adapters/tree/main/dataset. Open-domain Dialogue Generation. We use the Conv AI2 dataset (Dinan et al., 2019)... Mathematical Reasoning. For this task, we employ Meta Math (Yu et al., 2023) as the training corpus and GSM8K (Cobbe et al., 2021) as the test dataset.
Dataset Splits	Yes	Commonsense Reasoning. We utilize eight sub-tasks with predefined training and testing datasets (Hu et al., 2023), combining 170,420 query-answer pairs for fine-tuning LLMs and selecting 120 random entries as a validation set. Open-domain Dialogue Generation. We use the Conv AI2 dataset (Dinan et al., 2019), including 17,878 training and 1,000 testing multi-turn conversations. Table 11: The detailed statistics of commonsense reasoning datasets. Train 170,300 Mixed Validation 120 Mixed. Table 8: The statistics of the CONVAI2 dataset. Train 17,878, Test 1,000.
Hardware Specification	Yes	The computational cost for training on the commonsense reasoning task requires 14 GPU hours over 3 epochs on Nvidia-A100 80G GPU on Llama-3-8B, while the CONVAI2 task requires 9 GPU hours for a single epoch under the Hi RA (r = 32) on Nvidia-A100 80G on Llama-3-8B. Table 12: Comparison of GPU Memory Consumption and Running Time. Lo RA (r = 32) GRAM (GB): 65.48 Hi RA (r = 32) GRAM (GB): 61.49
Software Dependencies	No	Following the identical training setup to (Liu et al., 2024) except learning rate adjustments, we implement Hi RA on the Llama-2-7B and Llama-3-8B models with r = 16 and r = 32, respectively. The Adam W optimizer (Loshchilov & Hutter, 2019) is employed with a learning rate 0.001, which warms up for 100 steps. This mentions an optimizer but no specific version numbers for software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	Implementation Details. Following the identical training setup to (Liu et al., 2024) except learning rate adjustments, we implement Hi RA on the Llama-2-7B and Llama-3-8B models with r = 16 and r = 32, respectively. The Adam W optimizer (Loshchilov & Hutter, 2019) is employed with a learning rate 0.001, which warms up for 100 steps. For the commonsense reasoning dataset, we fine-tune LLMs for 3 epochs, with evaluations at every 80 step to select the best checkpoint based on the validation set. We place Lo RA, Do RA, Mo RA and Hi RA on the query, key, value weights, and two linear layers (i.e., down and up projection) in attention modules. [...] Experiments on the CONVAI2 dataset use 1 training epoch, while the mathematical reasoning task uses 2 epochs... Table 7: Hyperparameters for Hi RA. Optimizer Adam W, Weight Decay 0, Base Model [Llama-2-7B, Llama-3-8B], Learning Rate [0.0001, 0.0002], r [2, 4, 8, 16, 24, 28, 30, 32], Warm Up 100 steps, Batch Size 32, Target Modules q proj, k proj, v proj, up proj, down proj, Evaluation Steps Every 80 steps.