Fine-grained Control of Generative Data Augmentation in IoT Sensing
Authors: Tianshi Wang, Qikai Yang, Ruijie Wang, Dachun Sun, Jinyang Li, Yizhuo Chen, Yigong Hu, Chaoqi Yang, Tomoyoshi Kimura, Denizhan Kara, Tarek Abdelzaher
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated across various generative models, datasets, and downstream Io T sensing models. The results demonstrate that our approach surpasses the conventional transformationbased data augmentation techniques and prior generative data augmentation models. |
| Researcher Affiliation | Academia | Tianshi Wang, Qikai Yang, Ruijie Wang , Dachun Sun, Jinyang Li, Yizhuo Chen, Yigong Hu, Chaoqi Yang, Tomoyoshi Kimura, Denizhan Kara, Tarek Abdelzaher University of Illinois Urbana-Champaign, USA {tianshi3, qikaiy2, ruijiew2, dsun18, jinyang7, yizhuoc, yigongh2, chaoqiy2, tkimura4, kara4, zaher}@illinois.edu |
| Pseudocode | No | The paper presents diagrams and textual descriptions of its pipeline and methods (e.g., Figure 2, Section 3.2-3.4) but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | We do not plan to release our code or data before paper acceptance. |
| Open Datasets | Yes | Datasets: (1) Human-Activity Recognition (HAR): We use Real World-HAR dataset [47]... (3) Harmful brain activity recognition: We adopt Harmful Brain Activity Classification [18, 19] dataset... Public information and video footage about the human subjects can be found at: https://www.unimannheim.de/dws/research/projects/activity-recognition/dataset/dataset-realworld/ The two public datasets (Human Activity Recognition and Harmful Brain Activity Recognition) employed in this paper are both thoroughly documented and accessible online. |
| Dataset Splits | Yes | We divide the dataset randomly by subjects, assigning 10 subjects to the training set and the remaining 5 to the validation set. We segmented the entire recording into 2.5-second segments. This partition results in 4,959 training samples and 2,309 validation samples. The initial 30 minutes of data are used for training, while the final 10 minutes serve as validation. This process results in 8,015 training samples and 2,763 validation samples. We allocate 80% of the original EEG recordings to the training set and 20% to the testing set. This partition results in 78,548 training samples and 19,532 validation samples. |
| Hardware Specification | Yes | Our model was trained on a desktop with Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz and 4 Nvidia Ge Force RTX 2080 Ti. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'U-Net', 'VAE', 'Transformer', and 'Deep Sense', but it does not specify any version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train the diffusion model using an Adam optimizer with a learning rate of 0.0001, paired with a cosine annealing learning rate scheduler. The model is trained for 1,000 epochs on the HAR and vehicle detection datasets, and for 200 epochs on the harmful brain activity recognition dataset. The batch size is set at 200 for the HAR and vehicle detection datasets, and at 64 for the harmful brain activity recognition dataset. |