Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BlockScan: Detecting Anomalies in Blockchain Transactions
Authors: Jiahao Yu, Xian Wu, Hao Liu, Wenbo Guo, Xinyu Xing
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on Ethereum and Solana transactions demonstrate Block Scan s exceptional capability in anomaly detection while maintaining a low false positive rate. Remarkably, Block Scan is the only method that successfully detects anomalous transactions on Solana with high accuracy, whereas all other approaches achieved very low or zero detection recall scores. This work sets a new benchmark for applying Transformer-based approaches in blockchain data analysis. |
| Researcher Affiliation | Collaboration | 1 UC Santa Barbara 2 Meta AI 3 New York University 4 sec3 5 Northwestern University |
| Pseudocode | Yes | We present the pseudo algorithm of Block Scan in Algorithm 1 to help readers understand the workflow of Block Scan. Algorithm 1: Workflow of Block Scan |
| Open Source Code | Yes | The code, model, and datasets are available at an anonymous link. 3 Block Scan is the first open-source and best-performing transformer-based anomaly detection for De Fi that provides a theoretical guarantee. 3https://github.com/nuwuxian/tx_fm |
| Open Datasets | Yes | The code, model, and datasets are available at an anonymous link. 3 Block Scan is the first open-source and best-performing transformer-based anomaly detection for De Fi that provides a theoretical guarantee. As far as we know, this is the open-sourced dataset for transformer-based blockchain transaction anomaly detection. 3https://github.com/nuwuxian/tx_fm |
| Dataset Splits | Yes | We sample transactions from interactions with 5 De Fi applications for Ethereum and 10 applications for Solana to ensure diverse transaction patterns. For each De Fi application, transactions are ordered by their block timestamps and split into 80% for training and 20% for evaluation as benign transactions. This per-application sequential split is crucial to prevent time travel data leakage, ensuring that the model is trained exclusively on past data without access to future information. Specifically, our Ethereum dataset consists of 3,383 benign transactions for training, 709 benign transactions for testing, and 10 malicious transactions. The data was collected from October 2020 to April 2023. For Solana, our training dataset comprises 35,115 transactions, while the testing dataset includes 1,500 benign transactions and 18 malicious transactions. |
| Hardware Specification | Yes | The Solana model was trained over two days using eight A100 GPUs, while the Ethereum model required around 2 hours of training on the same hardware. Without Flash Attention, the model cannot handle even a batch size of 1 on an 80GB A100 GPU due to memory constraints. With Flash Attention, the training process becomes feasible, allowing a batch size of 2 while maintaining memory efficiency. |
| Software Dependencies | No | For the Doc2Vec approach, as described by [7], we first apply Doc2Vec [30] to extract features from the pre-processed and flattened traces of training transactions, as is shown in Figure 1. After obtaining the feature representations, we build a GMM to model the training transactions distribution using the Sklearn library [34] with default hyper-parameters. optimizer Adam [33] base learning rate 5e-5 |
| Experiment Setup | Yes | The complete set of training hyper-parameters is detailed in Table 4 and Table 5. Table 4: Configuration of training setup on Solana. Table 5: Configuration of training setup on Ethereum. |