CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework

Authors: Rui Li, Liyang He, Qi Liu, Yuze Zhao, Zheng Zhang, Zhenya Huang, Yu Su, Shijin Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well.
Researcher Affiliation Collaboration Rui Li1, 2, Liyang He1, 2, Qi Liu1, 2*, Yuze Zhao1, 2, Zheng Zhang1, 2, Zhenya Huang1, 2, Yu Su3, Shijin Wang2, 4 1Anhui Province Key Laboratory of Big Data Analysis and Application & School of Computer Science and Technology, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence 3School of Computer Science and Artificial Intelligence, Hefei Normal University 4i FLYTEK AI Research (Central China), i FLYTEK Co., Ltd. {ruili2000, heliyang, yuzezhao, zhangzheng}@mail.ustc.edu.cn, {huangzhy, qiliuql}@ustc.edu.cn, yusu@hfnu.edu.cn, sjwang3@iflytek.com
Pseudocode No The paper describes algorithms (e.g., Confusion-Matrix-Guided Sampling Algorithm) but does not present them in a structured pseudocode block or algorithm box.
Open Source Code Yes Our source code is available at https://github.com/smsquirrel/consider.
Open Datasets Yes Since we need to evaluate model performance in a multilingual environment, we have chosen Code Search Net (Husain et al. 2019) as our dataset.
Dataset Splits Yes Table 1: Code Search Net dataset statistics. Language Training Validation Test Codebase Ruby 2.5K 1.4K 1.2K 4.4K Java Script 5.8K 3.9K 3.3K 13.9K Go 16.7K 7.3K 8.1K 28.1K Python 25.2K 13.9K 14.9K 43.8K Java 16.4K 5.2K 10.9K 40.3K PHP 24.1K 13.0K 14.0K 52.7K
Hardware Specification Yes All experiments are conducted using two Tesla A100 GPUs.
Software Dependencies No Our CONSIDER framework is implemented in Py Torch. For all models, we map the final output dimensions to 768, utilizing the Adam W optimizer (Loshchilov and Hutter 2017). (No specific version numbers are provided for PyTorch, AdamW, or other software libraries beyond their names.)
Experiment Setup Yes Batch size, learning rate, and training steps are set to 256, 2e-5, and 50K respectively. The maximum sequence lengths for text and code are set to 128 and 320 respectively. All experiments are conducted using two Tesla A100 GPUs. We consider hyperparameters α within {0.5, 0.6, 0.7} and β within {1.5, 1.75, 2.0}. We conduct a grid search across various scenarios to identify their optimal combinations.