House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography

Authors: Xudong Pan, Shengyao Zhang, Mi Zhang, Yifan Yan, Min Yang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over 10000 real-world data samples within a carrier model which has 220 less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate (< 10 5) and with almost no utility loss on the carrier model (< 1%).
Researcher Affiliation Academia Xudong Pan Fudan University xdpan18@fudan.edu.cn Shengyao Zhang Fudan University shengyaozhang21@m.fudan.edu.cn Mi Zhang B Fudan University mi_zhang@fudan.edu.cn Yifan Yan Fudan University yanyf20@fudan.edu.cn Min Yang B Fudan University m_yang@fudan.edu.cn
Pseudocode Yes We then invoke the primitive Fill(P, fk, vk) in Algorithm A.1 in the supplementary material to replace the original parameters in fk by parameters in P.
Open Source Code Yes To facilitate future research, we open-source our code in https://anonymous.4open.science/r/data-hiding-66D0/.
Open Datasets Yes CIFAR-10 [24]: This dataset contains 60, 000 images of daily objects (e.g., cat, trunk and ship). Face Scrub [29]: This dataset contains 107, 818 face images of 530 male and female celebrities retrieved from the Internet. Speech Command (i.e., Speech) [44]: This dataset contains 35 different voice commands spoken by multiple subjects, which is composed of over 100,000 audio files of 1 second length with a sampling frequency of 16k Hz.
Dataset Splits No The paper mentions training and evaluating on datasets (e.g., CIFAR-10, Face Scrub, Speech Command) but does not explicitly state the specific percentages or methods used for validation dataset splits in the main text.
Hardware Specification No The paper does not explicitly specify the hardware components (e.g., GPU models, CPU types, memory) used for running the experiments in the provided text.
Software Dependencies No The paper mentions using an optimizer 'Adam [21]' but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup in the provided text.
Experiment Setup Yes In each secret task, we set the dimension of the pseudorandom noise vectors as 100 and the secret model as an off-the-shelf generator-like architecture which is detailed in the supplementary materials. We consider a standard Res Net-18 [19] as the carrier model, and the training on the CIFAR-10 [24] dataset as the open task. Subsequently, we invoke the Update primitive and resume the joint training to the next iteration.