Transfer Learning via $\ell_1$ Regularization
Authors: Masaaki Takada, Hironori Fujisawa
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that the proposed method effectively balances stability and plasticity. |
| Researcher Affiliation | Collaboration | Masaaki Takada Toshiba Corporation Tokyo 105-0023, Japan masaaki1.takada@toshiba.co.jp Hironori Fujisawa The Institute of Statistical Mathematics Tokyo 190-8562, Japan fujisawa@ism.ac.jp |
| Pseudocode | Yes | We provide a coordinate descent algorithm for Transfer Lasso. It is guaranteed to converge to a global optimal solution [36], because the problem is convex and the penalty is separable. Let β be the current value. Consider a new value βj as a minimizer of L(β; β) when other elements of β except for βj are fixed. We have βj L(β; β) = 1 n X j (y X jβ j) + βj + λα sgn(βj) + λ(1 α) sgn(βj βj) = 0, where Xj and X j denote the j-th column of X and X without j-th column, respectively, and sgn( ) denotes the sign function, hence we obtain the update rule as T (z, γ1, γ2, b) := 0 for γ1 z γ2 | γ2 z γ1 b for γ2 + b z γ1 + b | γ1 + b z γ2 + b z γ2 sgn(b) for γ2 z γ2 + b | γ2 + b z γ2 z γ1 sgn(z) for otherwise | otherwise. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | The newsgroup message data1 comprises messages from Usenet posts on different topics. We basically followed the concept drift experiments in [18] and used preprocessed data2. 1https://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html 2http://lpis.csd.auth.gr/mlkd/concept_drift.html |
| Dataset Splits | Yes | The regularization parameters λ and α were determined by ten-fold cross validation. The examples were divided into 30 batches without changing the order of the samples, each containing 50 examples. We trained models using each batch and evaluated them using the next batch. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software libraries or their version numbers used in the experiments. |
| Experiment Setup | Yes | The regularization parameters λ and α were determined by ten-fold cross validation. The parameter λ was selected by a decreasing sequence from λmax to λmax 10 4 in log-scale, where λmax was calculated as in Section 3.2. The parameter α was selected among {0, 0.25, 0.5, 0.75, 1}. Each dataset was centered and standardized such that y = 0, Xj = 0 and sd(Xj) = 1 in preprocessing. |