How to Protect Copyright Data in Optimization of Large Language Models?
Authors: Timothy Chu, Zhao Song, Chiwun Yang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 7.1, we provided the details of our experiment. In Section 7.2, we provided experimental results and analyzed the effectiveness of Copyright Regression. |
| Researcher Affiliation | Collaboration | Timothy Chu1, Zhao Song2, Chiwun Yang3 1 Google, Mountain View, CA 2Adobe Research, San Jose, CA 3 Sun Yat-sen University, China |
| Pseudocode | No | The paper describes mathematical definitions and theoretical properties but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of experiments in this paper is opensourced at https://github.com/Christian Yang37/chiwun/tree/ main/src/Copyright-Regression. |
| Open Datasets | Yes | We employed an open-source dataset Wikitext2 (Merity et al. 2016) to fine-tune our model, and evaluate the performance of our model on its test set. |
| Dataset Splits | No | The paper mentions using a 'test set' but does not explicitly provide details about a validation set split or specific percentages for training, validation, and test splits required for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | To assess the influence of copyright data with different proportions during training, we varied the value of n1/n to be n1/n ∈ {0.1, 0.2, 0.4, 0.6, 0.8}. Additionally, to evaluate the impact of different values of γc on copyright protection, we consider γc values of {0.1, 0.2, 0.3, 0.4, 0.5}. In addition, we fixed random seeds and conducted multiple experiments to record the maximum, minimum, and average values to ensure stable results were obtained. |