DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation
Authors: Wendi Li, Xiao Yang, Weiqing Liu, Yingce Xia, Jiang Bian4092-4100
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three realworld tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models. Extensive experiments have been conducted on a variety of popular streaming data scenarios with multiple widely-used learning models to evaluate the effectiveness of our proposed method DDG-DA. Experimental results have shown that DDG-DA could enhance the performance of learning tasks in all these scenarios. |
| Researcher Affiliation | Collaboration | Wendi Li1,2*, Xiao Yang2*, Weiqing Liu2, Yingce Xia2, Jiang Bian2 1 University of Wisconsin Madison 2 Microsoft Research Wendi.Li@wisc.edu, {Xiao.Yang, Weiqing.Liu, Yingce.Xia, Jiang.Bian}@microsoft.com |
| Pseudocode | No | The paper describes the method using diagrams and mathematical formulations, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of DDG-DA is open-source on Github2. 2https://github.com/Microsoft/qlib/tree/main/examples/benchmarks/dynamic/DDG-DA |
| Open Datasets | Yes | The experiments are conducted on multiple datasets in three real-world popular scenarios (forecasting on stock price trend, electricity load and solar irradiance (Grinold and Kahn 2000; Pedro, Larson, and Coimbra 2019)). |
| Dataset Splits | Yes | For each timestamp t, the target of task(t) := (D(t) train, D(t) test) is to learn a new model or adapt an existing model on historical data D(t) train and minimize the loss on D(t) test. These tasks are arranged in chronological order and separated at the beginning of 2016 into Tasktrain (all D(t) test in Tasktrain range from 2011 to 2015) and Tasktest (all D(t) test in Tasktest range from 2016 to 2020). On the validation data, Light GBM (Ke et al. 2017) performs best on average. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions software like Light GBM, LSTM, and GRU models, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To handle the concept drift in data, we retrain a new model each month (the rolling time interval is 1 month) based on two years of historical data (memory size is limited in an online setting). The time interval of two adjacent tasks is 20 trading days, 7 days and 7 days in stock price trend forecasting, electricity load forecasting and solar irradiance respectively. |