SALSA VERDE: a machine learning attack on LWE with sparse small secrets
Authors: Cathy Li, Emily Wenger, Zeyuan Allen-Zhu, Francois Charton, Kristin E. Lauter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using improved preprocessing and secret recovery techniques, VERDE can attack LWE with larger dimensions (n = 512) and smaller moduli (log2 q = 12 for n = 256), using less time and power. We propose novel architectures for scaling. Finally, we develop a theory that explains the success of ML LWE attacks. |
| Researcher Affiliation | Collaboration | Cathy Yuanchen Li FAIR, Meta Emily Wenger The University of Chicago Zeyuan Allen-Zhu FAIR, Meta Francois Charton FAIR, Meta Kristin Lauter FAIR, Meta |
| Pseudocode | No | The paper describes the attack methodology and secret recovery steps in narrative text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code and parameters to reproduce our main experiments are included in the supplementary material. The full code base will be open-sourced. |
| Open Datasets | No | Like PICANTE, VERDE starts with 4n LWE samples with the same secret s. In practice, this data would be eavesdropped. (The paper describes how it generates its own LWE samples for experiments and does not specify a pre-existing publicly available dataset, nor does it provide a link or citation for one.) |
| Dataset Splits | Yes | The 4 million reduced LWE pairs are used to train a transformer... VERDE runs the distinguisher on a held-out subset of 128 preprocessed vectors atest. |
| Hardware Specification | Yes | Our models train on one NVIDIA V100 32GB GPU and often succeed in the first epoch for low h. |
| Software Dependencies | No | The paper mentions specific software and algorithms like 'BKZ (as implemented in fplll [27])', 'BKZ 2.0 [19]', and 'Adam optimizer [39]' but does not provide explicit version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Model training is framed as a translation task, from a sequence of 2n tokens representing a to a sequence of 2 tokens representing b (see [40, 12] for similar uses of transformers for mathematical calculations). The model is trained to minimize the cross-entropy between model prediction and the sequence of tokens representing b, using the Adam optimizer with warmup [39] and a learning rate of 10^-5. For n = 256, 350 and 512, each epoch uses 2 million LWE samples and runs for 1.5, 1.6, or 2.5 hours. Time/epoch doesn’t vary with q or secret type. |