Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DoDo-Code: an Efficient Levenshtein Distance Embedding-based Code for 4-ary IDS Channel

Authors: Alan J.X. Guo, Sihan Sun, Xiang Wei, Mengyi Wei, Xin Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, a novel method is introduced for designing high-code-rate single-IDS-correcting codewords through deep Levenshtein distance embedding. A deep learning model is utilized to project the sequences into embedding vectors that preserve the Levenshtein distances between the original sequences. This embedding space serves as a proxy for the complex Levenshtein domain, within which algorithms for codeword search and segment correcting is developed. The proposed method results in a code rate that outperforms existing combinatorial solutions, particularly for designing short-length codewords. The paper also includes sections like "5 Experiments and Results", "5.1 Codewords in the embedding space", "5.2 Code rate and optimality", "5.3 Ablation study on embedding space searching and revised PNLL loss", and "5.4 Success rate and experimental time complexity of segment correcting" which demonstrate empirical evaluation.
Researcher Affiliation Academia 1Center for Applied Mathematics, KL-AAGDM, Tianjin University, Tianjin 300072, China 2State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 300072, China EMAIL
Pseudocode Yes Algorithm 1 Deep embedding-based greedy search of codewords
Open Source Code Yes The source code is available in https://github.com/aalennku/Do Do-Code.
Open Datasets No All the sequences used for training and testing are generated randomly. The groundtruth Levenshtein distance is obtained by a Python module called Levenshtein. Therefore, the experiments run inde-pendently of any specific dataset and generate the data on their own.
Dataset Splits No All the sequences used for training and testing are generated randomly. The groundtruth Levenshtein distance is obtained by a Python module called Levenshtein. Therefore, the experiments run inde-pendently of any specific dataset and generate the data on their own.
Hardware Specification No The paper does not explicitly state the specific hardware used for running its experiments (e.g., GPU/CPU models, memory details).
Software Dependencies No The paper mentions "Python module called Levenshtein" but does not specify a version number. Other mentions are of algorithms or architectures, not specific software dependencies with versions.
Experiment Setup Yes The embedding model utilizes an architecture of stacking 10 1D-CNNs, with the embedding vector dimension set to 64. The loss function is revised to emphasize the approximations between sequence pairs within the Levenshtein balls of radius 2. For visualization in Figure 3, experiments are performed with a simplified setting where the codeword length is set to N = 10, the code is reduced from a 4-ary alphabet to a binary alphabet, and the embedding dimension is reduced to 8. Comparisons for code rate are made among 10 runs. Experiments on segment correcting were conducted with different values of k (number of queried neighbors).