Temporally Correlated Task Scheduling for Sequence Learning
Authors: Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method significantly improves the performance of simultaneous machine translation and stock trend forecasting. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, Hefei, Anhui, China 2Microsoft Research, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 The optimization algorithm. |
| Open Source Code | Yes | Our code for simultaneous translation task is at https: //github.com/shirley-wu/simul-mt_ temporally-correlated-task-scheduling, and the code for stock price forecasting is at https://github.com/microsoft/qlib/tree/ main/examples/benchmarks/TCTS. |
| Open Datasets | Yes | For IWSLT 14 En De, following (Edunov et al., 2018), we split 7k sentences from the training corpus for validation, and the test set is the concatenation of tst2010, tst2011, tst2012, dev2010 and dev2012. For IWSLT 15 En Vi, following (Ma et al., 2020), we use tst2012 as the validation set and tst2013 as the test set. For IWSLT 17 En Zh, we concatenate tst2013, tst2014 and tst2015 as the validation set and use tst2017 as the test set. For WMT 15 En De, following (Ma et al., 2019; Arivazhagan et al., 2019), we use newstest2013 as the validation set and use newstest2015 as the test set. More details about datasets can be found in Appendix C.1. ... Dataset : We use the historical transaction data for 300 stocks on CSI300 (CSI300, 2008) from 01/01/2008 to 08/01/2020. We split the data into training (01/01/2008-12/31/2013), validation (01/01/2014-12/31/2015), and test sets (01/01/2016-08/01/2020) based on the transaction time. All the data is provided by Yang et al. (2020). |
| Dataset Splits | Yes | For IWSLT 14 En De, following (Edunov et al., 2018), we split 7k sentences from the training corpus for validation, and the test set is the concatenation of tst2010, tst2011, tst2012, dev2010 and dev2012. For IWSLT 15 En Vi, following (Ma et al., 2020), we use tst2012 as the validation set and tst2013 as the test set. For IWSLT 17 En Zh, we concatenate tst2013, tst2014 and tst2015 as the validation set and use tst2017 as the test set. For WMT 15 En De, following (Ma et al., 2019; Arivazhagan et al., 2019), we use newstest2013 as the validation set and use newstest2015 as the test set. ... We split the data into training (01/01/2008-12/31/2013), validation (01/01/2014-12/31/2015), and test sets (01/01/2016-08/01/2020) based on the transaction time. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) were provided. |
| Software Dependencies | No | The paper mentions software like 'momentum SGD, Adam', 'Transformer', 'GRU', 'Light GBM', 'GAT', but does not provide specific version numbers for these or other key software components or libraries. |
| Experiment Setup | Yes | For IWSLT En Zh and En Vi, we use the transformer small model, where the embedding dimension, feed-forward layer dimension, number of layers are 512, 1024 and 6 respectively. For IWSLT En De, we use the same architecture but change the embedding dimension into 256. For WMT 15 En De, we use the transformer big setting, where the above three numbers are 1024, 4096 and 6 respectively. The scheduler ϕ for each task is a multilayer perceptron (MLP) with one hidden layer and the tanh activation function. The size of the hidden layer is 256. ... The input of ϕ is a 7-dimension vector with the following features: (1) the ratios between the lengths of the source/target sentences to the average source/target sentence lengths in all training data (2 dimensions), i.e., Lx/(P x X Lx /|X|) and Ly/(P y Y Ly /|Y|); (2) the training loss over data (x, y) evaluated on the main task wait-k; (3) the average of historical training losses; (4) the validation loss of the previous epoch; (5) the average of historical validation loss; (6) the ratio of current training step to the total training iteration. ... Input: Training episode E; internal update iterations S; learning rate η1 of model f( ; θ) ; learning rate η2 of the scheduler ϕ( ; ω); batch size B; |