House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography
Authors: Xudong Pan, Shengyao Zhang, Mi Zhang, Yifan Yan, Min Yang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over 10000 real-world data samples within a carrier model which has 220 less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate (< 10 5) and with almost no utility loss on the carrier model (< 1%). |
| Researcher Affiliation | Academia | Xudong Pan Fudan University xdpan18@fudan.edu.cn Shengyao Zhang Fudan University shengyaozhang21@m.fudan.edu.cn Mi Zhang B Fudan University mi_zhang@fudan.edu.cn Yifan Yan Fudan University yanyf20@fudan.edu.cn Min Yang B Fudan University m_yang@fudan.edu.cn |
| Pseudocode | Yes | We then invoke the primitive Fill(P, fk, vk) in Algorithm A.1 in the supplementary material to replace the original parameters in fk by parameters in P. |
| Open Source Code | Yes | To facilitate future research, we open-source our code in https://anonymous.4open.science/r/data-hiding-66D0/. |
| Open Datasets | Yes | CIFAR-10 [24]: This dataset contains 60, 000 images of daily objects (e.g., cat, trunk and ship). Face Scrub [29]: This dataset contains 107, 818 face images of 530 male and female celebrities retrieved from the Internet. Speech Command (i.e., Speech) [44]: This dataset contains 35 different voice commands spoken by multiple subjects, which is composed of over 100,000 audio files of 1 second length with a sampling frequency of 16k Hz. |
| Dataset Splits | No | The paper mentions training and evaluating on datasets (e.g., CIFAR-10, Face Scrub, Speech Command) but does not explicitly state the specific percentages or methods used for validation dataset splits in the main text. |
| Hardware Specification | No | The paper does not explicitly specify the hardware components (e.g., GPU models, CPU types, memory) used for running the experiments in the provided text. |
| Software Dependencies | No | The paper mentions using an optimizer 'Adam [21]' but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup in the provided text. |
| Experiment Setup | Yes | In each secret task, we set the dimension of the pseudorandom noise vectors as 100 and the secret model as an off-the-shelf generator-like architecture which is detailed in the supplementary materials. We consider a standard Res Net-18 [19] as the carrier model, and the training on the CIFAR-10 [24] dataset as the open task. Subsequently, we invoke the Update primitive and resume the joint training to the next iteration. |