Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploiting Task Relationships in Continual Learning via Transferability-Aware Task Embeddings

Authors: Yanru Wu, Jianning Wang, Xiangyu Chen, Aurora, Yang Tan, Hanbing Liu, Yang Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on benchmarks including CIFAR-100, Image Net R, and Domain Net show that our framework performs prominently compared to various baseline and SOTA approaches, demonstrating strong potential in capturing and utilizing intrinsic task relationships. Our code is publicly available at https: //github.com/viki760/Hembedding_Guided_Hypernet.
Researcher Affiliation	Academia	1Shenzhen International Graduate School, Tsinghua University 2Independent Researcher EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1: H-embedding guided Hypernet: Training of Task j
Open Source Code	Yes	Our code is publicly available at https: //github.com/viki760/Hembedding_Guided_Hypernet.
Open Datasets	Yes	Extensive evaluations on benchmarks including CIFAR-100, Image Net R, and Domain Net show that our framework performs prominently compared to various baseline and SOTA approaches, demonstrating strong potential in capturing and utilizing intrinsic task relationships. Our code is publicly available at https: //github.com/viki760/Hembedding_Guided_Hypernet.
Dataset Splits	Yes	Permuted MNIST (10 tasks) [Goodfellow et al., 2013], CIFAR-100 (10 tasks) [Krizhevsky et al., 2009], Domain Net (5 tasks) [Peng et al., 2019] and Image Net-R (5 tasks, 10 tasks and 20 tasks) [Hendrycks et al., 2021]8. A detailed description of these benchmarks is listed in Appendix A.5. [...] Permuted MNIST [Goodfellow et al., 2013] (N=10) benchmark is a variant of MNIST [Le Cun et al., 1998], forming CL tasks from the original MNIST dataset by applying random permutations to the input image pixels. The permuting procedure can be repeated 9 times in experiments to yield a task sequence of 10, with each task consisting of 70,000 images (60,000 for training and 10,000 for testing) of digits from 0 to 9. CIFAR-100 [Krizhevsky et al., 2009] (N = 10) is a benchmark composed of 10 ten-way classification tasks composed by splitting a CIFAR-100 dataset into ten tasks. The model is sequentially trained on the tasks, each with 6,000 images (5,000 for training and 1,000 for testing). Image Net-R [Hendrycks et al., 2021], built upon the Image Net dataset [Deng et al., 2009], features a diverse range of renditions of Image Net classes. This benchmark includes a total of 30,000 images across 200 classes from Image Net. For continual learning evaluation, Image Net-R (N = 5, 10, 20) is formed by organizing Image Net-R into 5, 10, and 20 tasks respectively, each containing 40, 20, and 10 classes and around 6000, 3000, 1500 samples (roughly 5/6 for training and 1/6 for testing).
Hardware Specification	Yes	For Res Net, the experiments on CIFAR-100 are conducted on NVIDIA Ge Force RTX 3090 GPUs with 100 epochs of training (unless early-stop), and the Image Net-R experiments are carried out on NVIDIA A800 or A100 GPUs with 200 epochs of training (unless early-stop). For Vi T, all experiments are conducted on NVIDIA A800 or A100 GPUs until convergence.
Software Dependencies	No	The paper mentions using Adam optimizer, but no specific version number is provided for any software or library dependencies used to implement the methods.
Experiment Setup	Yes	To ensure a fair comparison, we adopt consistent training settings across all baseline methods we implement (unless listed separately). Specifically, the batch size is set to 32, and we use the Adam optimizer with an initial learning rate of 0.001. The learning rate is decayed by a factor of 10 after the 50th and 75th epochs. A weight decay of 1 10 4 is applied. For robustness, each experiment is run three times with different random seeds 22, 32, and 42, and the results are averaged. [...] For H-embedding guided hypernet, the learning rate is set to 0.0005, with the embedding loss beta and CL loss beta both set to 0.05. The scheduling and transforming strategies are set the same as those of von Oswald et al. [2020] and the embedding dimension is also set to 32. For the learning of H-embedding, we update Eqn. 5 using gradient descent for 2000 iterations and the f, g in HGR maximal correlation for 100 epochs respectively, both using a subset of 1000 samples from the training set. [...] For the H-embedding guided hypernet with Lo RA, the learning rate is set to 0.001, with both the embedding loss beta and CL loss beta set to 0.05. The Lo RA rank and alpha are configured at 16, resulting in about 0.7 million parameters generated by the hypernetwork, which is only 0.86% of the full Vi T model s 86 million parameters. The Lo RA dropout rate is set to 0.1. For CIFAR-100, a batch size of 32 and 15 epochs are used, while for Image Net-R, a batch size of 128 and 60 epochs are used.