Neural Embeddings for kNN Search in Biological Sequence
Authors: Zhihao Chang, Linzhu Yu, Yanchao Xu, Wentao Hu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our Bio-k NN significantly outperforms the state-of-the-art methods on two large-scale datasets without increasing the training cost. |
| Researcher Affiliation | Academia | Zhihao Chang1, Linzhu Yu2, Yanchao Xu2, Wentao Hu3 1The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China 3Zhejiang Police College, Hangzhou, China |
| Pseudocode | No | The paper describes methods in prose and with mathematical formulas but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and datasets are available at https://github.com/Proudc/Bio-KNN. |
| Open Datasets | Yes | We evaluate our neural embeddings through the utilization of two extensively recognized datasets(Dai et al. 2020; Zhang, Yuan, and Indyk 2019), i.e., the Uniprot and Uniref. These datasets exhibit varying sizes and sequence lengths, and their properties are shown in the Table 1. |
| Dataset Splits | Yes | Consistent with existing works, we partition each dataset into distinct subsets, namely the training set, query set, and base set. Both the training set and the query set are composed of 1,000 sequences, and the other items belong to the base set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory). |
| Software Dependencies | No | The paper mentions 'We use the EMBOSS1 to compute the NW distance between sequences' and provides a GitHub link for code and datasets, but it does not specify version numbers for EMBOSS or any other software dependencies. |
| Experiment Setup | No | The paper provides some details like 'set the split interval δ = 100' and mentions using 'the CNN submodule in CNNED', but it lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or a comprehensive description of the experimental setup in the main text. |