On the Ability of Developers' Training Data Preservation of Learnware

Authors: Hao-Yi Lei, Zhi-Hao Tan, Zhi-Hua Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper provides a theoretical analysis of RKME specification about its preservation ability for developer s training data. By modeling it as a geometric problem on manifolds and utilizing tools from geometric analysis, we prove that the RKME specification is able to disclose none of the developer s original data and possesses robust defense against common inference attacks, while preserving sufficient information for effective learnware identification. (From Abstract)...We have conducted validation experiments to further illustrate the tradeoff between data privacy and search quality in our work. Below, we present the experimental setting and empirical results. (From Appendix D)
Researcher Affiliation Academia Hao-Yi Lei, Zhi-Hao Tan, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {leihy, tanzh, zhouzh}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 Linkability Privacy Game... Algorithm 2 Inference Privacy Game
Open Source Code No The paper does not provide specific links or statements about the availability of open-source code for the methodology described.
Open Datasets Yes We use six real-world datasets: Postures [Gardner et al., 2014], Bank [Moro et al., 2014], Mushroom [Wagner et al., 2021], PPG-Da Li A [Reiss et al., 2019], PFS [Kaggle, 2018], and M5 [Makridakis et al., 2022].
Dataset Splits No We naturally split each dataset into multiple parts with different data distributions based on categorical attributes, and each part is then further subdivided into training and test sets. (No explicit mention of a validation set)
Hardware Specification No The paper states that experiments involve "various models" and implicitly require computational resources, but no specific hardware (CPU, GPU models, memory) is mentioned for running experiments.
Software Dependencies No For the specification of RKME, we use a Gaussian kernel k (x1, x2) = exp γ |x1 x2|2 2 with γ = 0.1. (Mentions specific software types like linear models, Light GBM, neural networks, but no version numbers for libraries or environments).
Experiment Setup Yes For the specification of RKME, we use a Gaussian kernel k (x1, x2) = exp γ |x1 x2|2 2 with γ = 0.1. For all user testing data, we set the number of synthetic data points in RKME, m, to 0, 10, 50 , 100, 200, 500, and 1000 to explore the tradeoff between search ability and data privacy (when m is 0 , a model is randomly selected).