Probing the Decision Boundaries of In-context Learning in Large Language Models

Authors: Siyan Zhao, Tung Nguyen, Aditya Grover

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. ... This paper investigates the factors influencing these decision boundaries and explores methods to enhance their generalizability. We assess various approaches, including training-free and fine-tuning methods for LLMs, the impact of model architecture, and the effectiveness of active prompting techniques for smoothing decision boundaries in a data-efficient manner.
Researcher Affiliation Academia Siyan Zhao, Tung Nguyen, Aditya Grover Department of Computer Science University of California Los Angeles {siyanz,tungnd,adityag}@cs.ucla.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is released at https://github.com/siyan-zhao/ICL_decision_boundary.
Open Datasets Yes We generate classification datasets using scikit-learn [Pedregosa et al., 2011], creating three types of linear and non-linear classification tasks: linear, circle, and moon, each describing different shapes of ground-truth decision boundaries. Detailed information on the dataset generation can be found in Appendix G.
Dataset Splits No The paper describes 'training tasks' and 'testing tasks' and their parameters for dataset generation in Appendix G, but does not explicitly mention a 'validation' split.
Hardware Specification No The paper mentions generating decision boundaries with '8-bit quantization due to computational constraints,' implying hardware was used, but it does not specify any particular GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions using 'scikit-learn [Pedregosa et al., 2011]' for dataset generation but does not provide a specific version number for scikit-learn or any other software dependency.
Experiment Setup Yes We choose a grid size scale of 50 x 50, resulting in 2500 queries for each decision boundary. ... To do this, we finetune a pretrained Llama model [Touvron et al., 2023] on a set of 1000 binary classification tasks... For each task, we sample randomly N = 256 data points... We then sample the number of context points m U[8, 128], and finetune the LLM to predict yi>m given xi>m and the preceding examples... We finetune the pretrained LLM using Lo RA [Hu et al., 2021] and finetune the attention layers. ... In our experiments, we used several classical machine learning models with the following hyperparameters: Decision Tree Classifier: We set the maximum depth of the tree to 3. Multi-Layer Perceptron: The neural network consists of two hidden layers, each with 256 neurons, and the maximum number of iterations is set to 1000. K-Nearest Neighbors: The number of neighbors is set to 5. Support Vector Machine (SVM): We used a radial basis function (RBF) kernel with a gamma value of 0.2.