Probing the Decision Boundaries of In-context Learning in Large Language Models
Authors: Siyan Zhao, Tung Nguyen, Aditya Grover
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. ... This paper investigates the factors influencing these decision boundaries and explores methods to enhance their generalizability. We assess various approaches, including training-free and fine-tuning methods for LLMs, the impact of model architecture, and the effectiveness of active prompting techniques for smoothing decision boundaries in a data-efficient manner. |
| Researcher Affiliation | Academia | Siyan Zhao, Tung Nguyen, Aditya Grover Department of Computer Science University of California Los Angeles {siyanz,tungnd,adityag}@cs.ucla.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at https://github.com/siyan-zhao/ICL_decision_boundary. |
| Open Datasets | Yes | We generate classification datasets using scikit-learn [Pedregosa et al., 2011], creating three types of linear and non-linear classification tasks: linear, circle, and moon, each describing different shapes of ground-truth decision boundaries. Detailed information on the dataset generation can be found in Appendix G. |
| Dataset Splits | No | The paper describes 'training tasks' and 'testing tasks' and their parameters for dataset generation in Appendix G, but does not explicitly mention a 'validation' split. |
| Hardware Specification | No | The paper mentions generating decision boundaries with '8-bit quantization due to computational constraints,' implying hardware was used, but it does not specify any particular GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions using 'scikit-learn [Pedregosa et al., 2011]' for dataset generation but does not provide a specific version number for scikit-learn or any other software dependency. |
| Experiment Setup | Yes | We choose a grid size scale of 50 x 50, resulting in 2500 queries for each decision boundary. ... To do this, we finetune a pretrained Llama model [Touvron et al., 2023] on a set of 1000 binary classification tasks... For each task, we sample randomly N = 256 data points... We then sample the number of context points m U[8, 128], and finetune the LLM to predict yi>m given xi>m and the preceding examples... We finetune the pretrained LLM using Lo RA [Hu et al., 2021] and finetune the attention layers. ... In our experiments, we used several classical machine learning models with the following hyperparameters: Decision Tree Classifier: We set the maximum depth of the tree to 3. Multi-Layer Perceptron: The neural network consists of two hidden layers, each with 256 neurons, and the maximum number of iterations is set to 1000. K-Nearest Neighbors: The number of neighbors is set to 5. Support Vector Machine (SVM): We used a radial basis function (RBF) kernel with a gamma value of 0.2. |