SILENCE: Protecting privacy in offloaded speech understanding on resource-constrained devices
Authors: Dongqi Cai, Shangguang Wang, Zeling Zhang, Felix Xiaozhu Lin, Mengwei Xu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have implemented SILENCE on the STM32H7 microcontroller and evaluate its efficacy under different attacking scenarios. Our results demonstrate that SILENCE offers speech understanding performance and privacy protection capacity comparable to existing encoders, while achieving up to 53.3 speedup and 134.1 reduction in memory footprint. |
| Researcher Affiliation | Academia | Dongqi Cai Beiyou Shenzhen Institude Shenzhen, Guangdong cdq@bupt.edu.cn Shangguang Wang Beiyou Shenzhen Institude Shenzhen, Guangdong sgwang@bupt.edu.cn Zeling Zhang Beiyou Shenzhen Institude Shenzhen, Guangdong marovlo@bupt.edu.cn Felix Xiaozhu Lin University of Virginia Charlottesville, VA, 22904 felixlin@virginia.edu Mengwei Xu Beiyou Shenzhen Institude Shenzhen, Guangdong mwx@bupt.edu.cn |
| Pseudocode | No | The paper describes the system's design and workflow using textual descriptions and diagrams (e.g., Figure 3, Figure 5), but it does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | We will open-source all the code and checkpoints to facilitate further research in this direction. |
| Open Datasets | Yes | We run our experiments on SLURP [28] and FSC [51]. Following prior work [15], we choose large-scale English reading corpus Libri Speech [52] for a multi-task protection scenario. |
| Dataset Splits | No | The paper mentions using SLURP and FSC datasets and refers to a 'SLURP training set' for a specific model, but it does not explicitly specify the overall training, validation, and test splits (e.g., as percentages or sample counts) for the main experiments. |
| Hardware Specification | Yes | Offline training is simulated on a server with 8 NVIDIA A40 GPUs. The trained mask generator is deployed into the STM32H7 [26] or Raspberry PI 4 (RPI-4B) [45]. STM32H7 is a resource-constrained microcontroller with 1MB RAM. RPI-4B is a popular development board with 4GB RAM. |
| Software Dependencies | No | We have fully implemented the SILENCE prototype atop Speech Brain [50], a Py Torch-based and unified speech toolkit. This mentions software names like Speech Brain and PyTorch, but without specific version numbers. |
| Experiment Setup | Yes | During the offline phase in Figure 5, we use the Adam optimizer with a learning rate of 1e-5 and a batch size of 4. For the inference step, we use the batch size of 1 to simulate the real streaming audio input scenario. KL threshold λ is set as 0.15 for all mask generators. |