reproducibilityindex.ai

Privacy-Preserving Video Classification with Convolutional Neural Networks

Authors: Sikha Pentyala, Rafael Dowsley, Martine De Cock

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed solution in an application for private human emotion recognition. Our results across a variety of security settings, spanning honest and dishonest majority conﬁgurations of the computing parties, and for both passive and active adversaries, demonstrate that videos can be classiﬁed with state-of-the-art accuracy, and without leaking sensitive user information.
Researcher Affiliation	Academia	1School of Engineering and Technology, University of Washington, Tacoma, WA, USA 2Faculty of Information Technology, Monash University, Clayton, Australia 3Dept. of Appl. Math., Computer Science and Statistics, Ghent University, Ghent, Belgium.
Pseudocode	Yes	Protocol 1 Protocol πFSELECT for oblivious frame selection Input: A secret shared 4D-array [[A]] of size N h w c with the frames of a video; a secret shared frame selection matrix [[B]] of size n N. The values N, h, w, c, n are known to all parties. Output: A secret shared 4D-array F of size n h w c holding the selected frames
Open Source Code	No	The paper states 'We implemented the protocols from Sec. 4 in the MPC framework MP-SPDZ (Keller, 2020)' but does not provide concrete access to their specific implementation code for the methodology described.
Open Datasets	Yes	We use 1,248 video-only ﬁles with speech modality from this dataset, corresponding to 7 different emotions, namely neutral (96), happy (192), sad (192), angry (192), fearful (192), disgust (192), and surprised (192). The videos in the RAVDESS dataset have a duration of 3-5 seconds with 30 frames per second, hence the total number of frames per video is in the range of 120-150. We split the data into 1,116 videos for training and 132 videos for testing.
Dataset Splits	No	The paper states 'We split the data into 1,116 videos for training and 132 videos for testing' and describes how the test set was formed. While it mentions 'early-stopping' which implies use of a validation set, it does not provide specific split information (percentages, counts, or explicit standard splits) for a validation set.
Hardware Specification	Yes	We implemented the protocols from Sec. 4 in the MPC framework MP-SPDZ (Keller, 2020), and ran experiments on co-located F32s V2 Azure virtual machines. Each of the parties (servers) ran on separate VM instances (connected with a Gigabit Ethernet network), which means that the results in the tables cover communication time in addition to computation time. A F32s V2 virtual machine contains 32 cores, 64 Gi B of memory, and network bandwidth of upto 14 Gb/s.
Software Dependencies	No	The paper mentions using 'the MPC framework MP-SPDZ (Keller, 2020)', 'Open CV (Bradski & Kaehler, 2008)', and 'Keras (Chollet et al., 2015)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Our video classiﬁer samples every 15th frame, classiﬁes it with the above Conv Net, and assigns as the ﬁnal class label the label that has the highest average probability across all frames in the video. ... For Bob s image classiﬁcation model, we trained a Conv Net with 1.48 million parameters with an architecture of [(CONV-RELU)-POOL]-[(CONV-RELU)2-POOL]2-[FC-RELU]*2-[FC-SOFTMAX]. We pre-trained the feature layers on the FER 2013 data to learn to extract facial features for emotion recognition, and ﬁne-tuned the model on the RAVDESS training data. ... With early stopping using a batch size of 256 and Adam optimizer with default parameters in Keras (Chollet et al., 2015). ... With early-stopping using a batch size of 64 and SGD optimizer with a learning rate 0.001, decay as 10 6, and momentum as 0.9. ... For the ring Z2k, we used value k = 64.