Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Offline-to-Online Hyperparameter Transfer for Stochastic Bandits
Authors: Dravyansh Sharma, Arun Suggala
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback. In this section, we provide empirical evidence for the significance of our hyperparameter transfer framework on real and synthetic data. |
| Researcher Affiliation | Collaboration | Dravyansh Sharma1, Arun Suggala2 1TTIC 2Google Deep Mind EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: UCB(α) Input: Arms {1, . . . , n}, max steps T Output: Arm pulls {At [n]}t [T ] Algorithm 2: TUNEDUCB(αmin, αmax) Input: Parameter interval [αmin, αmax], Arm rewards rijk, i [n], j [T], k [N] from offline data Output: Learned parameter ˆα Algorithm 3: α-CRITICALPOINTS(αl, αh, t[n], µ[n], R[n]) Input: Parameter interval [αmin, αmax], Arm pulls so far ti, Mean rewards so far µi, Future arm rewards Ri, i [n]. Output: Learned parameter ˆα. Algorithm 4: LINUCB(α) Input: Arms {1, . . . , n}, max steps T, feature dimension d Output: Arm pulls {At [n]}t [T ] Algorithm 5: GP-UCB(σ2) (Srinivas et al. 2010) Input: Input space C, GP prior µ0 = 0, σ0, kernel k( , ) such that k(x, x ) 1 for any x, x C, {βt}t [T ]. Output: Point {xt C}t [T ] |
| Open Source Code | No | The paper discusses various algorithms and their performance but does not provide an explicit statement about releasing its own source code or a link to a repository. |
| Open Datasets | Yes | We present our results for CIFAR-10 and CIFAR-100 (Krizhevsky 2009) benchmark image classification datasets. |
| Dataset Splits | Yes | For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20 |
| Hardware Specification | Yes | All our experiments on CIFAR are run on 1 Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions training neural networks via SGD (stochastic gradient descent) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The arms consist of 11 different learning rates (0.001,0.002,0.004,0.006,0.008,0.01,0.05,0.1, 0.2, 0.4, 0.8) and the arm reward is given by the classification accuracy of feedforward neural networks trained via SGD (stochastic gradient descent) with that learning rate and a batch size of 64 for 20 epochs. For each dataset we run Algorithm 2 over N = 200 training/offline tasks with time horizon To = 20, and run corralling for a grid of ten hyperparameter values α = {0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100}. |