Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation

Authors: Jun-Kun Wang, Andre Wibisono

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we aim at theoretically understanding GD with hard and conjugate labels for a binary classification problem. We show that for square loss, GD with conjugate labels converges to an ϵ-optimal predictor under a Gaussian model for any arbitrarily small ϵ, while GD with hard pseudo-labels fails in this task. We also analyze them under different loss functions for the update. Our results shed lights on understanding when and why GD with hard labels or conjugate labels works in test-time adaptation.Figure 1: Expected 0 1 loss vs. test-time iteration of GD. GD with hard labels under square loss (blue solid line) can not converge to the class mean µ in direction, while GD with conjugate labels under square loss (green dash dot line) learns an ϵ-optimal predictor.Figure 4: Expected 0-1 loss Φ µ wt σ wt vs. test-time t. Left: GD + hard + exp and GD + conj + exp. Right: GD + hard + logistic and GD + conj + logistic. Here best minimal error is Φ µ σ (recall the discussion in Section 3). Both figures show that GD with conjugate labels outperforms GD with hard labels.
Researcher Affiliation Academia Jun-Kun Wang and Andre Wibisono Department of Computer Science, Yale University {jun-kun.wang,andre.wibisono}@yale.edu
Pseudocode Yes Algorithm 1: Test-time adaptation via pseudo-labeling
Open Source Code No The information is insufficient. The paper does not contain an explicit statement about releasing its source code or a link to a code repository.
Open Datasets No The information is insufficient. The paper uses a synthetic Gaussian model for its theoretical analysis and simulations, explicitly stating 'samples from the new domain are generated as x N(yµ, σ2Id) Rd' and describing 'the setting of noisy data, i.e., xt Rd is sampled as xt N(µT , σ2 T Id)'. It does not refer to a publicly available or open dataset.
Dataset Splits No The information is insufficient. The paper describes a test-time adaptation setting where a model adapts to incoming unlabeled samples from a new domain. It does not mention conventional training, validation, or test dataset splits, as the process involves sequential adaptation rather than static dataset partitioning.
Hardware Specification No The information is insufficient. The paper does not provide any specific hardware details used for running its simulations or experiments.
Software Dependencies No The information is insufficient. The paper does not provide any specific software dependencies or version numbers needed to replicate the experiment.
Experiment Setup Yes In the simulation result depicted in Figure 1, a sample of (x = µ) arrives when the test time t is an odd number and a sample of (x = µ) arrives when the test time t is an even number. Note that the algorithms do not know the labels. In the simulation result depicted in Figure 4, we consider the setting of noisy data, i.e., xt Rd is sampled as xt N(µT , σ2 T Id) instead of xt = yµT . We search the step size η over the grid {10 3, 5 10 3, 10 2, 5 10 2, 10 1, 5 10 1, 100, 5 100, 101, 5 101, 102} for each GD + hard + exp, GD + conj + exp, GD + hard + logistic, or GD + conj + logistic, and report the best result of each one.