Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

Authors: Sengim Karayalcin, Marina Krček, Stjepan Picek

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents results for three common public SCA targets CHES_CTF, ESHARD, and ASCAD (see Appendix B for details). The models are Multilayer Perceptron (MLP) neural networks with their hyperparameters taken from [35] for ESHARD and ASCAD (see Appendix C). For CHES_CTF, we directly train the ESHARD model without additional hyperparameter tuning. Note that we focus on MLP and CNN architectures as these are generally sufficient for state-of-the-art performance in SCA [34]. The analyses given here are similar (although somewhat more cumbersome) for CNNs (see Appendix G).
Researcher Affiliation	Academia	Sengim Karayalçin Leiden University, The Netherlands EMAIL Marina Krˇcek Radboud University, The Netherlands EMAIL Stjepan Picek University of Zagreb Faculty of Electrical Engineering and Computing, Croatia & Radboud University, The Netherlands EMAIL
Pseudocode	No	The paper describes the analysis approach in Section 3 and illustrates it with Figure 1, but it does not present any formal pseudocode or algorithm blocks. The analysis steps are described in prose.
Open Source Code	Yes	Code to reproduce experiments is available at https://github.com/Sengim/feature_ emergence.
Open Datasets	Yes	We utilize publicly available datasets commonly used in SCA literature for benchmarking. These datasets implement AES-128 with Boolean masking protection. The attack set consists of 10 000 traces for each dataset. CHES CTF 2018 [19] consists of power consumption measurements from an AES-128 implementation running on ARM Cortex-M4 (32 bits). ESHARD-AES128 [47] consists of EM measurements from a software-masked AES-128 implementation running on an ARM Cortex-M4 device. ASCAD [2] measures EM emissions from an AES-128 implementation on AVR RISC (8 bits).
Dataset Splits	Yes	The attack set consists of 10 000 traces for each dataset. For CHES CTF: The profiling set has 30 000 traces. For ESHARD: This dataset contains 100 000 measurements with 90 000 traces for the profiling set. For ASCAD: 200 000 traces are used for profiling.
Hardware Specification	Yes	Training these models takes under an hour on a desktop workstation with 64GB RAM and an NVIDIA 4080 GPU.
Software Dependencies	No	The paper mentions using Multilayer Perceptron (MLP) and CNN architectures, Adam optimizer, relu and elu activations, but does not provide specific version numbers for any software libraries or programming languages used.
Experiment Setup	Yes	The model for CHES_CTF and ESHARD is a 4-layer MLP with 40 neurons in each layer with he_uniform weight initialization. We use relu activations. We use the Adam optimizer with a learning rate of 0.0025 and L1 regularization set to 0.000075. The batch size is 400, and we train for 200 epochs for CHES_CTF and 100 for ESHARD. For ASCAD, the model is a 6-layer MLP with 100 neurons in each layer with random_uniform weight initialization. We use the Adam optimizer with a learning rate of 0.0005. We use elu activations, and we again train for 100 epochs with a batch size of 400.