Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Authors: Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on code generation, commonsense reasoning, mathematical reasoning, general language understanding, and image generation benchmarks show that Gra Lo RA consistently outperforms Lo RA and other baselines, achieving up to +8.5% absolute gain in Pass@1 on Human Eval+. These improvements hold across model sizes and rank settings, making Gra Lo RA a scalable and robust solution for PEFT.
Researcher Affiliation	Collaboration	Yeonjoon Jung1,2 Daehyun Ahn1 Hyungjun Kim1 Taesu Kim1 Eunhyeok Park2 1Squeeze Bits 2POSTECH EMAIL EMAIL
Pseudocode	No	The paper describes methods through textual explanations and figures (like Figure 1, Figure 5, Figure 7, Figure 10) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like steps.
Open Source Code	Yes	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Yes, we do provide open access to code with sufficient instructions as supplemental material.
Open Datasets	Yes	We fine-tuned LLa MA3.1 8B ( [9]) with 4 A100 80G GPU on the Magicoder Evol-Instruct-110k [30] train dataset... Evaluation was conducted on the Humaneval+ test dataset following He et al. [10]... We fine-tuned LLa MA3.2 3B on Meta Math QA [34] train dataset... Evaluation was done on MATH [11] dataset... We trained and evaluated Ro BERTa-base [19], an encoderonly architecture model, on the GLUE [28] benchmark... We fine-tuned SDXL [24] following the official training setup from Huggingface diffusers repository, using the Naruto-Blip-Captions [5] dataset...
Dataset Splits	Yes	The dataset was split 90% for training and 10% for evaluation.
Hardware Specification	Yes	We fine-tuned LLa MA3.1 8B ( [9]) with 4 A100 80G GPU... Training was performed on 2 H100 80G GPUs for 1.5-8B models, and on 8 A100 80G GPUs for the 70B model... We fine-tuned LLa MA3.2 3B on Meta Math QA [34] train dataset using 4 H100 80G GPUs... All trainings were done on a single H100 80G GPU... We fine-tuned SDXL [24] following the official training setup from Huggingface diffusers repository, using the Naruto-Blip-Captions [5] dataset on a single H100 80G GPU.
Software Dependencies	No	The paper mentions 'Huggingface diffusers repository' and 'Big Code Evaluation Harness [1]' but does not provide specific version numbers for any key software components or libraries required to reproduce the experiments. The NeurIPS checklist states that code is provided in supplemental material, which would typically include such details, but the main paper text itself lacks this information.
Experiment Setup	Yes	Table 9: Hyperparameters for Code Generation, Commonsense Reasoning, Mathematical Reasoning, and Personalized Image Generation tasks. Task, Model, Method, Rank, LR, Batch size, Epochs, Optimizer. ... Table 10: Detailed hyperparameter settings for each sub-tasks in General Language Understanding. Model, Task, Method, Rank, LR, Head-LR, Batch size, Epochs, Optimizer.