DiffAR: Adaptive Conditional Diffusion Model for Temporal-augmented Human Activity Recognition

Authors: Shuokang Huang, Po-Yu Chen, Julie McCann

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four public datasets show that Diff AR achieves the best synthesis quality of augmented CSI and outperforms state-of-the-art CSI-based HAR methods in terms of recognition performance.
Researcher Affiliation Collaboration Shuokang Huang1 , Po-Yu Chen1,2 and Julie Mc Cann1 1Imperial College London 2JPMorgan Chase & Co. {s.huang21, po-yu.chen11, j.mccann}@imperial.ac.uk
Pseudocode Yes Algorithm 1 Training; Algorithm 2 Synthesis
Open Source Code Yes The source code of Diff AR is available at https://github.com/huangshk/Diff AR.
Open Datasets Yes We evaluate Diff AR on four public datasets, which differ in the number of samples, the number of activities, sample rate and window sizes. The variety of datasets enables a comprehensive evaluation. Table 1 describes the statistics of datasets. Office [Yousefi et al., 2017] contains 557 CSI recordings of 6 individuals in an office area. Sign Fi [Ma et al., 2018] involves 276 activities (sign language gestures) captured by Wi Fi CSI... Interactions [Alazrai et al., 2020] consists of CSI samples... Widar 3.0 [Zhang et al., 2021] includes CSI samples...
Dataset Splits Yes Each dataset is splited into a training set (80%), a validation set (10%), and a test set (10%).
Hardware Specification Yes We implement Diff AR using Pytorch 1.13 with Python 3.9 and train it on a single Nvidia RTX A5000 GPU.
Software Dependencies Yes We implement Diff AR using Pytorch 1.13 with Python 3.9 and train it on a single Nvidia RTX A5000 GPU.
Experiment Setup Yes In ACDM, we establish 10 residual blocks whose dimension for skip connections is 32. Each residual block applies multi-scale dilated convolutions whose kernel sizes are {1, 3, 5}, and the dilation cycle across these blocks is [1, 2, 4, 8, 16]. The dimension of step embedding is set to M = 128. To use CSI spectrogram as conditions, we set the size of STFT to 256, and the hop length to 64. We adopt a linear spaced noise schedule where βt 10 5, 10 2 with diffusion steps T = 100. In the ensemble classifier, each CNN-1D network contains 3 convolutional layers whose numbers of filters are {32, 64, 128} and kernel sizes are {7, 5, 3} with strides {3, 2, 1}. Each convolutional layer is followed by an Re LU activation with a dropout rate of 0.1. After concatenation, the feature dimension becomes 256, which is the input dimension of Transformer encoder. The Transformer encoder contains 2 encoder layers, where the number of heads is 8. The model is optimized by Adam [Kingma and Ba, 2014] with a fixed learning rate 10 4 and the batch size of 16. We leverage training sets to optimize ACDM for 105 epochs and exploit the trained ACDM to augment all three sets. The augmented training sets are used to optimize the ensemble classifier for 200 epochs.