Discounted Adaptive Online Learning: Towards Better Regularization

Authors: Zhiyu Zhang, David Bombara, Heng Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we complement the above theoretical results with OCP experiments (Section 4). ... We test three versions of our algorithm: MAGL-D is our Algorithm 1 with ε = 1 and λt = 0.999; MAGL is its undiscounted version (λt = 1); and MAGDIS is a much simplified variant of Algorithm 1 that basically sets ht = 0. ... The results are summarized in Table 1.
Researcher Affiliation Academia 1Harvard University. Correspondence to: Zhiyu Zhang <zhiyuz@seas.harvard.edu>, David Bombara <davidbombara@g.harvard.edu>, Heng Yang <hankyang@seas.harvard.edu>.
Pseudocode Yes Algorithm 1 1D magnitude learner on [0, ). ... Algorithm 2 ((Zhang et al., 2024, Algorithm 1 and 2) + (Cutkosky, 2019, Algorithm 2)) Undiscounted 1D magnitude learner on [0, ). ... Algorithm 3 Discounted adaptivity on Rd. ... Algorithm 4 The proposed OCP algorithm ACP . ... Algorithm 5 Modified 1D magnitude learner on [0, ).
Open Source Code Yes Link to the code: https://github.com/Computational Robotics/discounted-adaptive.
Open Datasets Yes We consider image classification in a sequential setting... We adopt the procedure, code, and base model from (Bhatnagar et al., 2023). ... Results are obtained using corrupted versions of Tiny Image Net, with time-varying corruption level (distribution shift).
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts. It mentions evaluation metrics like "local coverage error" but not data partitioning for validation.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models or memory.
Software Dependencies No The paper mentions software packages like "SCIPY and JAX" but does not provide specific version numbers for these or any other software dependencies crucial for replication.
Experiment Setup Yes We choose a targeted coverage rate of 90% for all experiments, which means α = 0.1. ... We test three versions of our algorithm: MAGL-D is our Algorithm 1 with ε = 1 and λt = 0.999; MAGL is its undiscounted version (λt = 1); and MAGDIS is a much simplified variant of Algorithm 1 that basically sets ht = 0. ... The only hyperparameter that we set is the learning rate, which we set to 1 for Simple OGD.