EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization

Authors: Dong HUANG, Jianbo Dai, Han Weng, Puzhen Wu, Yuhao QING, Heming Cui, Zhijiang Guo, Jie Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of EFFILEARNER, we conduct extensive experiments on the Effi Bench, Human Eval, and MBPP with 16 open-source and 6 closed-source models. Our evaluation results demonstrate that through iterative self-optimization, EFFI-LEARNER significantly enhances the efficiency of LLM-generated code.
Researcher Affiliation Academia Dong Huang The University of Hong Kong dhuang@cs.hku.hk Jianbo Dai University of Edinburgh j6dj6d@gmail.com Han Weng Beijing University of Posts and Telecommunications han.weng@bupt.edu.cn Puzhen Wu University College Dublin puzhen.wu@ucdconnect.ie Yuhao Qing The University of Hong Kong yhqing@cs.hku.hk Heming Cui The University of Hong Kong Shanghai AI Laboratory heming@cs.hku.hk Zhijiang Guo University of Cambridge zg283@cam.ac.uk Jie M. Zhang King s College London jie.zhang@kcl.ac.uk
Pseudocode No The paper describes the framework components and provides code examples in Python, but does not contain a dedicated pseudocode or algorithm block.
Open Source Code Yes The source code of EFFI-LEARNER was released in https://github.com/huangd1999/Effi Learner.
Open Datasets Yes To evaluate the effectiveness of EFFI-LEARNER, we conduct extensive experiments on the Effi Bench, Human Eval, and MBPP with 16 open-source and 6 closed-source models. We evaluate EFFI-LEARNER on Effi Bench [27]. For Human Eval and MBPP datasets, we set the test cases provided by Human Eval and MBPP as open test cases, while test cases provided by Eval Plus [35] (i.e., Human Eval-Plus, MBPP-Plus) as private test cases that were used to calculate the final results.
Dataset Splits Yes Following Huang et al. [27], we utilize the open test cases to calculate the efficiency metrics during the self-optimization process, while private test cases provided by Effi Bench were used for the final result evaluation. For Human Eval and MBPP datasets, we set the test cases provided by Human Eval and MBPP as open test cases, while test cases provided by Eval Plus [35] (i.e., Human Eval-Plus, MBPP-Plus) as private test cases that were used to calculate the final results.
Hardware Specification Yes All of the experiments are conducted in an edge server with an Intel Xeon Platinum 8336C CPU with 128 cores, and 8 * NVIDIA A100-SXM GPUs Total memory capacity of 2.0Ti B.
Software Dependencies No The paper mentions using 'Python', 'line_profiler library', and 'memory_profiler library' but does not provide specific version numbers for these software components.
Experiment Setup Yes We carefully design prompts to guide LLMs in optimizing code efficiency while ensuring the optimized code passes predefined test cases. The prompt template (Figure 3) used in EFFI-LEARNER s self-optimization stage includes a task description, test case, initial code, overhead analysis, and optimization rules. To investigate the impact of the number of self-optimization steps on the efficiency of the EFFILEARNER-optimized code, we conduct an ablation study by varying the number of steps from 0 to 5.