Voice2Series: Reprogramming Acoustic Models for Time Series Classification
Authors: Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%. |
| Researcher Affiliation | Collaboration | 1Georgia Institute of Technology 2Columbia University 3IBM Research. |
| Pseudocode | Yes | Algorithm 1 summarizes the training procedure of our proposed V2S reprogramming algorithm. |
| Open Source Code | Yes | Our code is available at https://github.com/huckiyang/ Voice2Series-Reprogramming. |
| Open Datasets | Yes | Tested on a standard UCR time series classification benchmark (Dau et al., 2019) |
| Dataset Splits | Yes | We use 10-fold splitting on training data to select the best performed model based on the validation loss and report the accuracy on test data with an average of 10 runs, which follows a similar experimental setting used in (Cabello et al., 2020). |
| Hardware Specification | No | The paper mentions “on-GPU audio preprocessing layer” but no specific hardware details (e.g., GPU model, CPU, memory). |
| Software Dependencies | Yes | We use Tensorflow (Abadi et al., 2016) (v2.2) to implement our V2S framework following Algorithm 1. To enable end-to-end V2S training, we use the Kapre toolkit (Choi et al., 2017) to incorporate an on-GPU audio preprocessing layer, as shown in Figure 2. |
| Experiment Setup | Yes | For the V2S parameters in Algorithm 1, we use α = 0.05 and a mini-batch size of 32 with T = 100 training epochs. We use maximal many-to-one random label mapping, which assigns |YS||YT| non-overlapping source labels to every target label, where |Y| is the size of the label set Y and z is the floor function that gives the largest integer not exceeding z. To stabilize the training process, we add weight decay as a regularization term to the V2S loss and set the regularization coefficient to be 0.04. Our V2S implementation is open-source and available at https://github.com/ huckiyang/Voice2Series-Reprogramming. For model tuning, we use dropout during training on the reprogramming parameters θ. Moreover, during input reprogramming we also replicate the target signal xt into m segments and place them starting from the beginning of the reprogrammed input with an identical interval (see Figure 4 (a) as an example with m = 3). For each task, we report the best result of V2S among a set of hyperparmeters with dropout rate {0, 0.1, 0.2, 0.3, 0.4} and the number of target signal replication m {1, 2, . . . , 10}. |