Learning to Transfer with von Neumann Conditional Divergence
Authors: Ammar Shaker, Shujian Yu, Daniel Oñoro-Rubio8231-8239
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our MDD on four real-world datasets (i) Amazon review dataset,...), and (iv) the relative location of CT slices on the axial axis dataset (Graf et al. 2011). We compare our approach with the baseline elastic weight consolidation (EWC) (Kirkpatrick, Pascanu et al. 2017) and three other SOTA methods on five benchmark datasets. Empirical results demonstrate that our approach reduces catastrophic forgetting and is less sensitive to the choice of hyper-parameters. |
| Researcher Affiliation | Collaboration | Ammar Shaker1*, Shujian Yu2,3*, Daniel O noro-Rubio1 1 NEC Laboratories Europe, Heidelberg, Germany 2 Ui T The Arctic University of Norway, Tromsø, Norway 3 Xi an Jiaotong University, Xi an, Shaanxi, China ammar.shaker@neclab.eu, yusj9011@gmail.com, daniel.onoro@neclab.eu |
| Pseudocode | Yes | We term our method the multi-source domain adaptation with matrix-based discrepancy distance (MDD) (pseudocode in the supplementary material). |
| Open Source Code | No | The paper states 'Supplementary material is available in our Arxiv version https: //arxiv.org/abs/2108.03531' and mentions '(pseudocode in the supplementary material)' but does not explicitly state that the source code for the described methodology is provided or link directly to a code repository. |
| Open Datasets | Yes | We evaluate our MDD on four real-world datasets (i) Amazon review dataset,3 (ii) TRANCOS which is a public benchmark for extremely overlapping vehicle counting, (iii) the Year Prediction MSD data (Bertin-Mahieux et al. 2011), and (iv) the relative location of CT slices on the axial axis dataset (Graf et al. 2011). We evaluate on the following datasets: (i) MNIST Permutations (mnist P) (Kirkpatrick, Pascanu et al. 2017), (ii) MNIST Rotations (mnist R) (Lopez-Paz and Ranzato 2017), (iii) Permuted Fashion-MNIST (fashion P) (Han, Kashif, and Roland 2017), and (iv) Permuted not MNIST (notmnist P).4 All these datasets contain images of size 28 28 pixels. Additionally, we also perform a comparison on the Omniglot dataset (Lake et al. 2011)... |
| Dataset Splits | No | The paper states 'Each domain is used once as target and the remaining as sources' which describes a specific experimental setup rather than a train/validation/test split with percentages or counts. It also mentions 'A grid-based hyperparameter search is carried on for each method on each dataset as explained in the supplementary material' which implies validation, but no specific split details are provided in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'SGD' and 'Re LU activation' but does not provide specific version numbers for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | The Adam optimizer is used with learning rate lr = 0.001, and batch size of 300. We use 30 training epochs, and perform 5 independent runs. we use a single head fullyconnected neural network with two hidden layers, each with 100 neurons, a 28 28 input layer, and an output layer with a single head with 10 units. We employ the online setting with a restricted memory budget of ten samples per task. In our experiments, we fix the number of groups to be Kd = 20. |