Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Authors: Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%.
Researcher Affiliation Collaboration 1Georgia Institute of Technology 2Columbia University 3IBM Research.
Pseudocode Yes Algorithm 1 summarizes the training procedure of our proposed V2S reprogramming algorithm.
Open Source Code Yes Our code is available at https://github.com/huckiyang/ Voice2Series-Reprogramming.
Open Datasets Yes Tested on a standard UCR time series classification benchmark (Dau et al., 2019)
Dataset Splits Yes We use 10-fold splitting on training data to select the best performed model based on the validation loss and report the accuracy on test data with an average of 10 runs, which follows a similar experimental setting used in (Cabello et al., 2020).
Hardware Specification No The paper mentions “on-GPU audio preprocessing layer” but no specific hardware details (e.g., GPU model, CPU, memory).
Software Dependencies Yes We use Tensorflow (Abadi et al., 2016) (v2.2) to implement our V2S framework following Algorithm 1. To enable end-to-end V2S training, we use the Kapre toolkit (Choi et al., 2017) to incorporate an on-GPU audio preprocessing layer, as shown in Figure 2.
Experiment Setup Yes For the V2S parameters in Algorithm 1, we use α = 0.05 and a mini-batch size of 32 with T = 100 training epochs. We use maximal many-to-one random label mapping, which assigns |YS||YT| non-overlapping source labels to every target label, where |Y| is the size of the label set Y and z is the floor function that gives the largest integer not exceeding z. To stabilize the training process, we add weight decay as a regularization term to the V2S loss and set the regularization coefficient to be 0.04. Our V2S implementation is open-source and available at https://github.com/ huckiyang/Voice2Series-Reprogramming. For model tuning, we use dropout during training on the reprogramming parameters θ. Moreover, during input reprogramming we also replicate the target signal xt into m segments and place them starting from the beginning of the reprogrammed input with an identical interval (see Figure 4 (a) as an example with m = 3). For each task, we report the best result of V2S among a set of hyperparmeters with dropout rate {0, 0.1, 0.2, 0.3, 0.4} and the number of target signal replication m {1, 2, . . . , 10}.