Privacy-Preserving End-to-End Spoken Language Understanding

Authors: Yinggui Wang, Wei Huang, Le Yang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.
Researcher Affiliation Collaboration Yinggui Wang1 , Wei Huang1 and Le Yang2 1Ant Group 2University of Canterbury
Pseudocode No The paper describes the model architecture and training processes in prose and diagrams (Figure 3, Figure 4) but does not include structured pseudocode or algorithm blocks.
Open Source Code No Our experiments were all conducted on the open source tool, i.e., Espnet [Watanabe et al., 2018]. This refers to a tool used, not their own code release. No statement or link for their own code.
Open Datasets Yes The five datasets were used to simulate realistic scenarios, namely SLURP [Bastianelli et al., 2020], Fluent Speech Commands (FSC) [Lugosch et al., 2019], Libri Speech [Panayotov et al., 2015],Voxceleb1 [Nagrani et al., 2017] and TED-Lium [Rousseau et al., 2012].
Dataset Splits No Libri Speech. Libri Speech is a large dataset consisting of approximately 1000 hours of English reading. It is derived from the reading of audiobooks from the Libri Vox project. In this paper, we use train-clean360 and train-clean100 to pretraining and training our ASR task. For testing, the test set of ASR or IR is passed through the encoder... There is no explicit mention of a validation set or how data is split for validation specifically.
Hardware Specification No The paper mentions that experiments were conducted using the Espnet tool but does not provide specific details about the hardware used (e.g., GPU/CPU models, memory, or cloud infrastructure specifications).
Software Dependencies No The paper states 'Our experiments were all conducted on the open source tool, i.e., Espnet [Watanabe et al., 2018]' and mentions 'Adam optimizer', but it does not specify version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes First, the 80-dimensional Fbank features are extracted as low-dimensional feature inputs, and its frame length is 25ms with 10ms frame shift. In order to make the input size of the model match the selected data volume, We set the number of layers for Encoder and Decoder to be 12 and 6 respectively, and the attention dimension, number of header as well as droupout ratio to be 256, 4 and 0.1. Then, the individual part dimension of the SLU of the SH-PPSLU model is set to be 128, and the individual and shared part dimensions of the three tasks of the H-PPSLU model are both 64. Our model is initialized with a 50-epoch trained ASR model weight. During the training process, the algorithms are trained for 15 epochs using Adam optimizer with the learning rate 0.001, and the coefficient of the ASR loss function 0.1. For the adversarial training we train the model for 10 epochs.