Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When and How Unlabeled Data Provably Improve In-Context Learning

Authors: Yingcong Li, Xiangyu Chang, Muti Kara, Xiaofeng Liu, Amit K. Roy-Chowdhury, Samet Oymak

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form P i 0 ai(X X)i X y with X and y denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be expressed as a function of depth and draw connections to Expectation Maximization, an iterative pseudo-labeling algorithm commonly used in semi-supervised learning. Importantly, the leading polynomial power is exponential in depth, so mild amount of depth/looping suffices. As an application of theory, we propose looping off-the-shelf tabular foundation models to enhance their semi-supervision capabilities. Extensive evaluations on real-world datasets show that our method significantly improves the semisupervised tabular learning performance over the standard single pass inference.
Researcher Affiliation Academia Yingcong Li1,4 Xiangyu Chang2 Muti Kara3 Xiaofeng Liu1 Amit Roy-Chowdhury2 Samet Oymak1 1University of Michigan 2University of California, Riverside 3Bilkent University 4NJIT
Pseudocode Yes Algorithm 1 Loop Tab FM: Looping Tabular FM with Soft Pseudo-labels and Risk-aware Updates
Open Source Code Yes Our code is available at https://github.com/xiaofengliu-water/Loop Tab FM.
Open Datasets Yes We evaluated the effectiveness of our proposed looping strategy by iteratively applying Tab PFN-v2 on real-world binary classification benchmarks used in Hollmann et al. (2025). The results are summarized in Table 1, where each entry represents an average over 100 random splits of the dataset, with 80% of the data used as the test set in each split.
Dataset Splits Yes The results are summarized in Table 1, where each entry represents an average over 100 random splits of the dataset, with 80% of the data used as the test set in each split. For each experiment, we randomly sample 10 labeled and 10 unlabeled examples, ensuring that the labeled set includes at least one example from each class.
Hardware Specification No No specific hardware details (like GPU/CPU models) are mentioned for the experimental setup. The paper only states: "All models are trained using Adam optimizer with a learning rate of 10 3 for 40,000 epochs, with a batch size of 512. We use logistic loss in our experiments."
Software Dependencies No No specific software versions (e.g., Python 3.8, PyTorch 1.9) are mentioned. The paper only states: "All models are trained using Adam optimizer with a learning rate of 10 3 for 40,000 epochs, with a batch size of 512. We use logistic loss in our experiments."
Experiment Setup Yes All models are trained using Adam optimizer with a learning rate of 10 3 for 40,000 epochs, with a batch size of 512. We use logistic loss in our experiments.