Uncovering and Quantifying Social Biases in Code Generation
Authors: Yan Liu, Xiaokang Chen, Yan Gao, Zhe Su, Fengji Zhang, Daoguang Zan, Jian-Guang Lou, Pin-Yu Chen, Tsung-Yi Ho
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, In Coder, and Code Gen) with varying sizes, reveal severe social biases. |
| Researcher Affiliation | Collaboration | Microsoft Research Peking University The Chinese University of Hong Kong IBM Research {runningmelles, ho.tsungyi, fenj.zhang}@gmail.com, pkucxk@pku.edu.cn, zhesu@andrew@cmu.edu, daoguang@iscas.ac.cn, pin-yu.chen@ibm.com, {yan.gao, jlou}@microsoft.com |
| Pseudocode | No | The paper describes methods and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, trained classifier, and data are available at https://github.com/the Namek/Code-Bias.git. |
| Open Datasets | Yes | Our code, trained classifier, and data are available at https://github.com/the Namek/Code-Bias.git. |
| Dataset Splits | Yes | Annotated data is randomly partitioned into train, development, and test sets with a ratio of 7 : 2 : 1. |
| Hardware Specification | No | The paper mentions running experiments and evaluating models, but it does not specify any particular hardware details such as GPU models, CPU types, or memory configurations used for the experiments. |
| Software Dependencies | No | The paper mentions using specific models like 'BERT-Base [7]' and 'word2vec' for classifiers, but it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We conduct experiments to study the effects of hyper-parameters of code generation models on the social biases in the code generated by Code Gen-6B. We mainly analyze two hyper-parameters: temperature t [1] and top-p [14]... We set the values of temperature t from {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}... We set the values of top-p from {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. |