reproducibilityindex.ai

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Authors: Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology 2Columbia University 3IBM Research.
Pseudocode	Yes	Algorithm 1 summarizes the training procedure of our proposed V2S reprogramming algorithm.
Open Source Code	Yes	Our code is available at https://github.com/huckiyang/ Voice2Series-Reprogramming.
Open Datasets	Yes	Tested on a standard UCR time series classiﬁcation benchmark (Dau et al., 2019)
Dataset Splits	Yes	We use 10-fold splitting on training data to select the best performed model based on the validation loss and report the accuracy on test data with an average of 10 runs, which follows a similar experimental setting used in (Cabello et al., 2020).
Hardware Specification	No	The paper mentions “on-GPU audio preprocessing layer” but no specific hardware details (e.g., GPU model, CPU, memory).
Software Dependencies	Yes	We use Tensorﬂow (Abadi et al., 2016) (v2.2) to implement our V2S framework following Algorithm 1. To enable end-to-end V2S training, we use the Kapre toolkit (Choi et al., 2017) to incorporate an on-GPU audio preprocessing layer, as shown in Figure 2.
Experiment Setup	Yes	For the V2S parameters in Algorithm 1, we use α = 0.05 and a mini-batch size of 32 with T = 100 training epochs. We use maximal many-to-one random label mapping, which assigns \|YS\|\|YT\| non-overlapping source labels to every target label, where \|Y\| is the size of the label set Y and z is the floor function that gives the largest integer not exceeding z. To stabilize the training process, we add weight decay as a regularization term to the V2S loss and set the regularization coefﬁcient to be 0.04. Our V2S implementation is open-source and available at https://github.com/ huckiyang/Voice2Series-Reprogramming. For model tuning, we use dropout during training on the reprogramming parameters θ. Moreover, during input reprogramming we also replicate the target signal xt into m segments and place them starting from the beginning of the reprogrammed input with an identical interval (see Figure 4 (a) as an example with m = 3). For each task, we report the best result of V2S among a set of hyperparmeters with dropout rate {0, 0.1, 0.2, 0.3, 0.4} and the number of target signal replication m {1, 2, . . . , 10}.