On the Ability of Developers' Training Data Preservation of Learnware
Authors: Hao-Yi Lei, Zhi-Hao Tan, Zhi-Hua Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper provides a theoretical analysis of RKME specification about its preservation ability for developer s training data. By modeling it as a geometric problem on manifolds and utilizing tools from geometric analysis, we prove that the RKME specification is able to disclose none of the developer s original data and possesses robust defense against common inference attacks, while preserving sufficient information for effective learnware identification. (From Abstract)...We have conducted validation experiments to further illustrate the tradeoff between data privacy and search quality in our work. Below, we present the experimental setting and empirical results. (From Appendix D) |
| Researcher Affiliation | Academia | Hao-Yi Lei, Zhi-Hao Tan, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {leihy, tanzh, zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Linkability Privacy Game... Algorithm 2 Inference Privacy Game |
| Open Source Code | No | The paper does not provide specific links or statements about the availability of open-source code for the methodology described. |
| Open Datasets | Yes | We use six real-world datasets: Postures [Gardner et al., 2014], Bank [Moro et al., 2014], Mushroom [Wagner et al., 2021], PPG-Da Li A [Reiss et al., 2019], PFS [Kaggle, 2018], and M5 [Makridakis et al., 2022]. |
| Dataset Splits | No | We naturally split each dataset into multiple parts with different data distributions based on categorical attributes, and each part is then further subdivided into training and test sets. (No explicit mention of a validation set) |
| Hardware Specification | No | The paper states that experiments involve "various models" and implicitly require computational resources, but no specific hardware (CPU, GPU models, memory) is mentioned for running experiments. |
| Software Dependencies | No | For the specification of RKME, we use a Gaussian kernel k (x1, x2) = exp γ |x1 x2|2 2 with γ = 0.1. (Mentions specific software types like linear models, Light GBM, neural networks, but no version numbers for libraries or environments). |
| Experiment Setup | Yes | For the specification of RKME, we use a Gaussian kernel k (x1, x2) = exp γ |x1 x2|2 2 with γ = 0.1. For all user testing data, we set the number of synthetic data points in RKME, m, to 0, 10, 50 , 100, 200, 500, and 1000 to explore the tradeoff between search ability and data privacy (when m is 0 , a model is randomly selected). |