Cross-Modal Federated Human Activity Recognition via Modality-Agnostic and Modality-Specific Representation Learning

Authors: Xiaoshan Yang, Baochen Xiong, Yi Huang, Changsheng Xu3063-3071

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiment results on four datasets demonstrate the effectiveness of our method.
Researcher Affiliation Academia 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2Zhengzhou University 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4Peng Cheng Laboratory
Pseudocode No The paper describes the proposed network architecture and optimization steps textually and with diagrams, but it does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the public availability of the source code for the described methodology.
Open Datasets Yes Epic-Kitchens. This is the largest public multimodal dataset in egocentric HAR (Damen et al. 2020). Multimodal-EA. This is an earlier multimodal dataset for egocentric HAR (Song et al. 2016b). Stanford-ECM. This dataset contains 31 hours of egocentric videos with sensor signals of 23 activities (Nakamura et al. 2017).
Dataset Splits No For all the above four datasets, we randomly split local instances on each client into the training and test sets with a ratio of 0.75 : 0.25.
Hardware Specification No The paper mentions that models were trained but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The overall framework of our method is implemented with Pytorch (Paszke et al. 2019).
Experiment Setup Yes Our model and baselines are all trained with SGD optimizer, where the weight decay is set to 1e-5 and the momentum is set to 0.9. On the Epic-Kitchens, the learning rate η of the local client is set to 0.001 and the batch size is set to 64. On the other three datasets, the learning rate and the batch size are set to 0.01 and 32, respectively. On all four datasets, the number of local epochs ε is set to 2, and the number of communication rounds T is 300. In our method, both the altruistic encoder and the egocentric encoder are implemented as two-layer perceptrons with the activation function of Re LU, where the dimension of the hidden layer and the output dimension d are all set to 1024. For the shared activity classifier and the private classifier, the dimension of the hidden layer is set to 1024. For the modality discriminator, the output dimension d of the single-layer perceptron φ is set to 128. The margin factor τ and scale factor α of the additive angular margin loss in Eq. (4) are set to 0.5 and 72. The margin factor ν of the spreadout regularizer in Eq. (7) is set to 1.5. The balance weights γ1 and γ2 of the loss function are set to 0.6 and 0.4.