Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AC-LoRA: (Almost) Training-Free Access Control Aware Multi-Modal LLMs

Authors: Lara Magdalena Lazier, Aritra Dhar, Vasilije Stambolic, Lukas Cavigelli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an end-to-end prototype of AC-LORA, evaluate it on two datasets, and show that AC-LORA matches or even exceeds the performance of state-of-the-art Lo RA mixing techniques while providing strong isolation guarantees. Furthermore, we show that AC-LORA design can be directly applied to different modalities.
Researcher Affiliation	Collaboration	Lara Magdalena Lazier1 Aritra Dhar1 Vasilije Stambolic2 Lukas Cavigelli1 1Computing System Labs, Huawei Technologies Switzerland AG 2EPFL
Pseudocode	No	The paper describes the steps of AC-LORA in a numbered list within Section 3 ('Summary of the Secure Lo RA Retrieval and Merging') and visually in Figure 4, but it does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	AC-LORA is open-source and is available at https://github.com/huawei-csl/AC-Lo RA.
Open Datasets	Yes	We evaluate AC-LORA on Rep Li QA [22] dataset, which consists of a wide range of knowledge-specific questions across 3591 documents spanning 17 different domains, and wikiarts [23], an image dataset that consists of 27 different style domains. ... 1. Rep Li QA: Rep Li QA (split 0 cf. Fig. B.2) consists of several small artificial articles covering various topics... 2. Flan: Flan V2 contains datasets of 10 task domains (cf. Tab. 2). ... Appendix F provides a list of the assets along with their licenses. Repli QA dataset: CC BY 4.0 Flan V2 dataset: Apache License Version 2.0, January 2004 Wikiart dataset: BSD 3-Clause License MMSci dataset: CC BY 4.0
Dataset Splits	Yes	1. Rep Li QA: ...We split it into an 80-20 training-test set, ensuring with stratification that each article is seen at least once in the training set. ... 2. Flan: ...We utilize their test set, which consists of 50 data points per task. As the training set used for the different Lo RAs was not shared, we constructed one based on the official Flan V2 dataset for the retriever. In particular, we take the first 30k (or fewer for smaller tasks) samples of each selected task as the training set... Combining Knowledge: ...We built one test and two disjoint training sets (with and without context)... In total, each of the training sets contains 640 data points. The test set comprises all the combined Lo Ra questions and the remaining single-Lo Ra questions (a total of 1,065 data points).
Hardware Specification	Yes	We run our experiments on two workstation GPUs with 48GB GDDR6 VRAM. ... Appendix E.1 Evaluation Setup: We run our experiments on two workstation GPUs, each with 10752 processing cores, 48GB GDDR6 VRAM. (384-bit bus and 768 GB/s memory bandwidth), and a 38.7 TFLOPS single precision performance. The GPU is connected to a host (2 x86 44-core CPU with 256 GB RAM) over a PCIe 4.0.
Software Dependencies	Yes	We finetune 17 Lo RAs (with rank r = α = 64), one for each topic, using LLAMA3.1-8B-INSTRUCT as the base model. ...To build the embedding database for AC-LORA, we use the ALL-MNET-BASE-V2 [50] sentence transformer... We query QWEN2-VL to generate descriptions of the images... We then finetune STABLE-DIFFUSION-V1-4... To evaluate the different experiments on the Rep Li QA dataset, we use GEMMA-3-27B to give each generated answer a grade... We use deepseek-r1:32b to extract the most relevant facts... Appendix F provides a list of the assets along with their licenses... Meta Llama: META LLAMA 3 COMMUNITY Google Gemma: Open source Qwen models: royalty-free limited license all-mpnet-base-v2: Apache License Version 2.0, January 2004 langchain: MIT PEFT: Apache License Version 2.0, January 2004 Stable-diffusion: Creative ML Open RAIL-M, August 22, 2022
Experiment Setup	Yes	We finetune 17 Lo RAs (with rank r = α = 64), one for each topic, using LLAMA3.1-8B-INSTRUCT as the base model. As seen in Fig. 3(a), we keep the finetune step size 200 to avoid overfitting. ... We load the base model, LLAMA3.1-8B, using 4-bit precision, nested (double) quantization, with normalized 4-bit quantization type and bfloat16 as the compute data type. We configure the Lo RA adapters with an attention dimension r = 16, scaling factor α = 64, and a dropout probability 0.1. ... Appendix E.2.1 Rep Li QA: For language models, we use unsloth [60] to fine-tune the different Lo RAs... We finetune the 17 Lo RAs for the Rep Li QA dataset with the hyperparameters displayed in Tab. E.1. Table E.1: epochs 3 per device train batch size 4 gradient accumulation steps 8 learning rate 1e-4 lora alpha 64 r 64 lora modules o proj, k proj, gate proj, down proj, v proj, q proj, up proj