Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Feature Updates Improve Online (Generalized) Label Shift Adaptation
Authors: Ruihan Wu, Siddhartha Datta, Yi Su, Dheeraj Baby, Yu-Xiang Wang, Kilian Q. Weinberger
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiment In this section, we initiate OLS-OFU with three popular SSL techniques and empirically evaluate how OLS-OFU improves the original OLS methods on both online label shift and online generalized label shift on various datasets and shift patterns. |
| Researcher Affiliation | Collaboration | Ruihan Wu UC San Diego EMAIL Siddhartha Datta University of Oxford EMAIL Yi Su Google Deep Mind EMAIL Dheeraj Baby UC Santa Barbara EMAIL Yu-Xiang Wang UC San Diego EMAIL Kilian Q. Weinberger Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Online label shift adaptation with online feature updates (OLS-OFU). |
| Open Source Code | Yes | Code is released at https://github.com/dattasiddhartha/online-feature-updates-olsofu |
| Open Datasets | Yes | For online label shift, we evaluate the efficacy of our algorithm on CIFAR-10 [29], STL10 [12], CINIC [13], and Euro SAT [25]. For online generalized label shift, the offline train and validation sets are the CIFAR-10 images. The test unlabeled batches are drawn from CIFAR-10C [26], a benchmark with the same objects as CIFAR-10 but with various types of corruption. |
| Dataset Splits | Yes | For each dataset, we split the original train set into the offline train (i.e., D0) and validation sets (i.e., D 0) following a ratio of 4 : 1. |
| Hardware Specification | No | The paper mentions evaluating methods with ResNet18 and discusses time costs, but does not specify the hardware (e.g., GPU model, CPU, memory) used for these experiments. |
| Software Dependencies | No | The paper refers to using PyTorch (implicitly for neural networks) and ResNet18, but does not specify version numbers for any software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We experiment with T = 1000 and batch size B = 10 at each time step, following Baby et al. [6]. The frequency parameter τ is fixed as 100 for most experiments unless we particularly mention it. The seed used to train our model is 4242, and we train an additional 4 models on seeds 4343, 4545, 4646, 4747. |