Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography

Authors: Xudong Pan, Shengyao Zhang, Mi Zhang, Yifan Yan, Min Yang

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over 10000 real-world data samples within a carrier model which has 220 less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate (< 10 5) and with almost no utility loss on the carrier model (< 1%).
Researcher Affiliation Academia Xudong Pan Fudan University EMAIL Shengyao Zhang Fudan University EMAIL Mi Zhang B Fudan University EMAIL Yifan Yan Fudan University EMAIL Min Yang B Fudan University EMAIL
Pseudocode Yes We then invoke the primitive Fill(P, fk, vk) in Algorithm A.1 in the supplementary material to replace the original parameters in fk by parameters in P.
Open Source Code Yes To facilitate future research, we open-source our code in https://anonymous.4open.science/r/data-hiding-66D0/.
Open Datasets Yes CIFAR-10 [24]: This dataset contains 60, 000 images of daily objects (e.g., cat, trunk and ship). Face Scrub [29]: This dataset contains 107, 818 face images of 530 male and female celebrities retrieved from the Internet. Speech Command (i.e., Speech) [44]: This dataset contains 35 different voice commands spoken by multiple subjects, which is composed of over 100,000 audio files of 1 second length with a sampling frequency of 16k Hz.
Dataset Splits No The paper mentions training and evaluating on datasets (e.g., CIFAR-10, Face Scrub, Speech Command) but does not explicitly state the specific percentages or methods used for validation dataset splits in the main text.
Hardware Specification No The paper does not explicitly specify the hardware components (e.g., GPU models, CPU types, memory) used for running the experiments in the provided text.
Software Dependencies No The paper mentions using an optimizer 'Adam [21]' but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup in the provided text.
Experiment Setup Yes In each secret task, we set the dimension of the pseudorandom noise vectors as 100 and the secret model as an off-the-shelf generator-like architecture which is detailed in the supplementary materials. We consider a standard Res Net-18 [19] as the carrier model, and the training on the CIFAR-10 [24] dataset as the open task. Subsequently, we invoke the Update primitive and resume the joint training to the next iteration.