When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Authors: Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Microsoft Research mozannar@mit.edu |
| Pseudocode | No | The paper does not contain any sections or blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code is available2 and additional details can be found in the appendix. (footnote 2: https://github.com/microsoft/coderec programming states) |
| Open Datasets | No | To build and evaluate our methods, we extract a large number of telemetry logs from Copilot users (mostly software engineers and researchers) at Microsoft. Programmers provided consent for the use of their data, and its use was approved by Microsoft s ethics advisory board. |
| Dataset Splits | Yes | We split the telemetry dataset in a 70:10:20 split for training, validation, and testing respectively. |
| Hardware Specification | No | The time to compute the features needed for the models and performing inference on a single data point can take 10ms with a GPU and less than 1ms on a CPU when omitting embeddings, in addition to latency of sending and receiving information between server and client. This mentions general 'GPU' and 'CPU' but does not provide specific model numbers or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions software like eXtreme Gradient Boosting (XGB), CodeBERT, and Tree-sitter Parser, but does not specify their version numbers, nor any programming language versions or other libraries with specific version details. |
| Experiment Setup | Yes | We set the thresholds t1, t2, tr on the validation set for CDHF and evaluate on the test set. ...Our proposed approach is as follows: Each time the programmer pauses typing, we decide using a predictor whether to show a suggestion. Crucially, we do this using a two-stage scheme... |