reproducibilityindex.ai

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples

Authors: Moustapha M. Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.
Researcher Affiliation	Collaboration	Moustapha Cisse Facebook AI Research moustaphacisse@fb.com Yossi Adi* Bar-Ilan University, Israel yossiadidrum@gmail.com Natalia Neverova* Facebook AI Research nneverova@fb.com Joseph Keshet Bar-Ilan University, Israel jkeshet@cs.biu.ac.il
Pseudocode	No	The paper provides mathematical equations and descriptions of the proposed loss function and its gradient, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a link to its source code or explicitly state that the code for the methodology is openly available.
Open Datasets	Yes	We perform the evaluations on the validation subset of MPII dataset [3] consisting of 3000 images and deﬁned as in [26]. [...] We use a pre-trained Dilation10 model for semantic segmentation [38] and evaluate the success of the attacks on the validation subset of Cityscapes dataset [8]. [...] The model achieves 12% Word Error Rate and 1.5% Character Error Rate on the Librispeech dataset [27], with no additional language modeling.
Dataset Splits	No	The paper mentions using 'validation subset' for MPII and Cityscapes datasets and 'clean test set' for Librispeech, but it does not specify the exact percentages or counts for training, validation, and test splits needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using specific models and architectures (e.g., 'Deep Speech-2 based architecture [1]', 'Hourglass networks [26]', 'Dilation10 model [38]') and loss functions (e.g., 'Connectionist Temporal Classiﬁcation (CTC) loss function [13]'), but it does not specify any version numbers for these or other software dependencies like programming languages or libraries.
Experiment Setup	Yes	We perform the optimization iteratively till convergence with an update rule ϵ x x where x are gradients with respect to the input and ϵ = 0.1. [...] We used a perturbation of magnitude ϵ = 0.05 and sampled 20 models for Probit (therefore 20x more computationally expensive than Houdini). [...] The model gets as input raw spectrograms (extracted using a window size of 25ms, frame-size of 10ms and Hamming window), and outputs a transcript.