Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Understanding Attention-Based In-Context Learning for Categorical Data
Authors: Aaron T Wang, William Convertino, Xiang Cheng, Ricardo Henao, Lawrence Carin
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the framework empirically on synthetic data, image classification and language generation. 2. We empirically validate our framework through experiments on diverse datasets: (a) We tackle in-context image classification on Image Net (Russakovsky et al., 2014)... (b) We apply our GD-based model to language generation, training on a combined corpus of Tiny Stories and Children Stories (Eldan & Li, 2023)... |
| Researcher Affiliation | Academia | 1Electrical & Computer Engineering Dept., Duke University, Durham, NC, USA. Correspondence to: Lawrence Carin <EMAIL>. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided, but the model architecture and steps are described in prose and diagrams in Section 3 and Figures 1 and 2. |
| Open Source Code | Yes | Code needed to replicate our experiments is at https://github.com/aarontwang/icl_attention_categorical. |
| Open Datasets | Yes | We tackle in-context image classification on Image Net (Russakovsky et al., 2014)... We apply our GD-based model to language generation, training on a combined corpus of Tiny Stories and Children Stories (Eldan & Li, 2023)... 1https://huggingface.co/datasets/ajibawa-2023/Children-Stories-Collection |
| Dataset Splits | Yes | For each contextual set C(l), 5 distinct classes are selected uniformly at random, and for each such class 10 specific images are selected at random, and therefore N = 50 (image N + 1 is selected at random from the 5 class types considered in the context data). When training L = 2048, and test performance is averaged for M = 2048. |
| Hardware Specification | Yes | All experiments were performed on a Tesla V100 PCIe 16 GB GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries or programming languages used for their implementation. It only mentions the use of "GPT-4o model" for evaluation. |
| Experiment Setup | Yes | embedding vectors are learned for each token, with C = 50, 257 unique tokens represented and an embedding dimension d = 512; 8 attention heads are use for both models. Additionally, positional embedding vectors are learned for each of the 256 positions in our model s context window, with an additional 257th position learned for the GD model (for position x N+1). |