reproducibilityindex.ai

HuRef: HUman-REadable Fingerprint for Large Language Models

Authors: Boyi Zeng, Lizheng Wang, Yuncong Hu, Yi Xu, Chenghu Zhou, Xinbing Wang, Yu Yu, Zhouhan Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiment is twofold. First, we validated the effectiveness and robustness of invariant terms for identifying the base model. Second, we generated fingerprints based on invariant terms for 80 LLMs and quantitatively assessed their discrimination ability through a human subject study.
Researcher Affiliation	Academia	Boyi Zeng1, Lizheng Wang2, Yuncong Hu2, Yi Xu2 Chenghu Zhou3, Xinbing Wang2, Yu Yu2, Zhouhan Lin1 1LUMIA Lab, Shanghai Jiao Tong University 2Shanghai Jiao Tong University, 3Chinese Academy of Sciences
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code	Yes	1The code is available at https://github.com/LUMIA-Group/Hu Ref.
Open Datasets	Yes	We independently trained GPT-Neo X-350M models on a subset of the Pile dataset (Gao et al., 2020).
Dataset Splits	No	The paper does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits for all datasets used) for its main experiments. It refers to standard benchmarks like BoolQ, PIQA, etc., and mentions data synthesis for the encoder training, but not explicit splits for LLM evaluation.
Hardware Specification	Yes	We trianed the CNN encoder for 2 hours using a single RTX4090. For extracting invariant terms and caculating cosine similarity, they only need a little cpu resources. The most compute resources are consumed in reproduced baselines in Section 5.1.4, in which we used 4 A100 40G for 8 days.
Software Dependencies	No	The paper describes the software components (e.g., 'Style GAN2 generator'), but does not provide specific version numbers for any of the key software dependencies (e.g., libraries, frameworks).
Experiment Setup	Yes	In the training stage, we alternate training the discriminator and encoder every 10 steps. We set the batch size to 10, the initial learning rate to 0.0001, and introduce a noise intensity α of 0.16 for positive samples. After 8 epochs of training, we obtained the encoder used in our paper. We used a convolution neural network (CNN) as the encoder. The CNN encoder takes invariant terms R4096 4096 6 as input and produces a feature vector v as output. Our CNN encoder structure, as depicted in Figure 4, consists of the first four convolutional layers and the last mean pooling layer. The hyperparameters for the four convolutional layers are provided in the table below: ... For the discriminator: We utilize a simple 3-layer MLP as the discriminator.