Recursive Time Series Data Augmentation
Authors: Amine Mohamed Aboussalah, Minjae Kwon, Raj G Patel, Cheng Chi, Chi-Guhn Lee
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply RIM to diverse synthetic and real-world time series cases to achieve strong performance over non-augmented data on a variety of learning tasks. We empirically demonstrate learning improvements using synthetic data as well as real world time series datasets. |
| Researcher Affiliation | Academia | 1New York University, 2University of Virginia, 3University of Toronto |
| Pseudocode | Yes | Algorithm 1: RIM: RL Training |
| Open Source Code | Yes | The code for all the experiments can be found at the following link. https://anonymous.4open.science/r/RIM-Time-Series-Data-Augmentation-D02A/ |
| Open Datasets | Yes | Task 3: Real dataset Indoor User Movement from the Radio Signal Strength (RSS) Data This binary classification task from Bacciu et al. (2014) is associated with predicting the pattern of user movements in real world office environments from time series generated by a Wireless Sensor Network (WSN). Task 4: Real dataset Ford Engine Condition We use a subset of the Ford A dataset from 2008 WCCI Ford classification challenge Abou-Nasr & Feldkamp (2007). Predicting Air Quality. The restricted air quality dataset contains 1200 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors. This is a time series regression task where the target is the next time step’s CO concentration. The input data contains the last six time steps 10 features as used in De Vito et al. (2008) and for Machine Learning & Repository. |
| Dataset Splits | No | The paper does not specify explicit validation set splits (e.g., percentages or exact counts). While it mentions "validation cumulative return" in tables, it doesn't detail the method of splitting data into a distinct validation set for reproducibility. |
| Hardware Specification | Yes | For our experiments, we train the Time GAN for 2500 epochs (3 hours on Xeon Processors CPU) for synthetic datasets and 5000 epochs (6 hours on Xeon Processors CPU) for real datasets. |
| Software Dependencies | No | The paper mentions types of models and optimizers used (e.g., Convolutional Neural Network, LSTM, Adam optimizer, Batch Norm) but does not provide specific version numbers for any software libraries, frameworks, or programming languages (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Test Accuracy for the exponential synthetic ODE system using a Convolutional Neural Network with kernel size=3, filter=32, batch size=16, using Batch Norm and Adam optimizer. Test Accuracy for the SPY500 Dataset using an LSTM model with 2 LSTM layers (200 neurons), 2 dense layers (100 neurons), lr=1e-4, batch size=16. For Indoor movement classification task, we conducted the same experiment in Section 4 with 9 different hyperparameter configurations as shown in table 1 and observed that for all the cases RIM outperforms Non-Augmented case (with smaller mean test loss and higher mean test accuracy) which solidifies our claim of enhancement observed in model performance when we use RIM. |