Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography
Authors: Xudong Pan, Shengyao Zhang, Mi Zhang, Yifan Yan, Min Yang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over 10000 real-world data samples within a carrier model which has 220 less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate (< 10 5) and with almost no utility loss on the carrier model (< 1%). |
| Researcher Affiliation | Academia | Xudong Pan Fudan University EMAIL Shengyao Zhang Fudan University EMAIL Mi Zhang B Fudan University EMAIL Yifan Yan Fudan University EMAIL Min Yang B Fudan University EMAIL |
| Pseudocode | Yes | We then invoke the primitive Fill(P, fk, vk) in Algorithm A.1 in the supplementary material to replace the original parameters in fk by parameters in P. |
| Open Source Code | Yes | To facilitate future research, we open-source our code in https://anonymous.4open.science/r/data-hiding-66D0/. |
| Open Datasets | Yes | CIFAR-10 [24]: This dataset contains 60, 000 images of daily objects (e.g., cat, trunk and ship). Face Scrub [29]: This dataset contains 107, 818 face images of 530 male and female celebrities retrieved from the Internet. Speech Command (i.e., Speech) [44]: This dataset contains 35 different voice commands spoken by multiple subjects, which is composed of over 100,000 audio files of 1 second length with a sampling frequency of 16k Hz. |
| Dataset Splits | No | The paper mentions training and evaluating on datasets (e.g., CIFAR-10, Face Scrub, Speech Command) but does not explicitly state the specific percentages or methods used for validation dataset splits in the main text. |
| Hardware Specification | No | The paper does not explicitly specify the hardware components (e.g., GPU models, CPU types, memory) used for running the experiments in the provided text. |
| Software Dependencies | No | The paper mentions using an optimizer 'Adam [21]' but does not provide specific version numbers for any software dependencies or libraries used in the experimental setup in the provided text. |
| Experiment Setup | Yes | In each secret task, we set the dimension of the pseudorandom noise vectors as 100 and the secret model as an off-the-shelf generator-like architecture which is detailed in the supplementary materials. We consider a standard Res Net-18 [19] as the carrier model, and the training on the CIFAR-10 [24] dataset as the open task. Subsequently, we invoke the Update primitive and resume the joint training to the next iteration. |