Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
Authors: Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method outperforms state-of-the-arts in online continual learning. Furthermore, the proposed method is evaluated against a realistic setting where the boundaries between tasks are blurred. Experimental results confirm that the proposed method outperforms the state-of-the-arts on CIFAR-10, CIFAR-100, and Tiny-Image Net. |
| Researcher Affiliation | Collaboration | Hung-Jen Chen1, An-Chieh Cheng1, Da-Cheng Juan2, Wei Wei2, Min Sun134 1National Tsing-Hua University, Hsinchu, Taiwan 2Google Research, Mountain View, USA , 3Appier Inc., Taiwan 4MOST Joint Research Center for AI Technology and All Vista Healthcare, Taiwan |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a link or explicit statement about releasing its source code. |
| Open Datasets | Yes | We conduct the experiments on CIFAR-10, CIFAR-100, and Tiny-Image Net. |
| Dataset Splits | No | The paper mentions 'validation' as a concept but does not provide specific details on validation dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | No | The paper mentions 'National Center for High-performance Computing for computer time and facilities' but does not provide specific details on hardware components like GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set the batch size to 10 for all experiments and train it in an online learning manner, which will discard every batch of received data after updating the weights. We train the models by either SGD or Adam optimizer depends on the default setting of each work and does a simple grid search for how many iterations should it run. ... λ is a hyperparameter. ... We set κ = 10 and ϵ as a matrix filled with value 0.5 in our experiments |