reproducibilityindex.ai

SALSA VERDE: a machine learning attack on LWE with sparse small secrets

Authors: Cathy Li, Emily Wenger, Zeyuan Allen-Zhu, Francois Charton, Kristin E. Lauter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using improved preprocessing and secret recovery techniques, VERDE can attack LWE with larger dimensions (n = 512) and smaller moduli (log2 q = 12 for n = 256), using less time and power. We propose novel architectures for scaling. Finally, we develop a theory that explains the success of ML LWE attacks.
Researcher Affiliation	Collaboration	Cathy Yuanchen Li FAIR, Meta Emily Wenger The University of Chicago Zeyuan Allen-Zhu FAIR, Meta Francois Charton FAIR, Meta Kristin Lauter FAIR, Meta
Pseudocode	No	The paper describes the attack methodology and secret recovery steps in narrative text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code and parameters to reproduce our main experiments are included in the supplementary material. The full code base will be open-sourced.
Open Datasets	No	Like PICANTE, VERDE starts with 4n LWE samples with the same secret s. In practice, this data would be eavesdropped. (The paper describes how it generates its own LWE samples for experiments and does not specify a pre-existing publicly available dataset, nor does it provide a link or citation for one.)
Dataset Splits	Yes	The 4 million reduced LWE pairs are used to train a transformer... VERDE runs the distinguisher on a held-out subset of 128 preprocessed vectors atest.
Hardware Specification	Yes	Our models train on one NVIDIA V100 32GB GPU and often succeed in the first epoch for low h.
Software Dependencies	No	The paper mentions specific software and algorithms like 'BKZ (as implemented in fplll [27])', 'BKZ 2.0 [19]', and 'Adam optimizer [39]' but does not provide explicit version numbers for these or other software dependencies.
Experiment Setup	Yes	Model training is framed as a translation task, from a sequence of 2n tokens representing a to a sequence of 2 tokens representing b (see [40, 12] for similar uses of transformers for mathematical calculations). The model is trained to minimize the cross-entropy between model prediction and the sequence of tokens representing b, using the Adam optimizer with warmup [39] and a learning rate of 10^-5. For n = 256, 350 and 512, each epoch uses 2 million LWE samples and runs for 1.5, 1.6, or 2.5 hours. Time/epoch doesn’t vary with q or secret type.