Mutual Transfer Learning for Massive Data
Authors: Ching-Wei Cheng, Xingye Qiao, Guang Cheng
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulated and real examples are analyzed to illustrate the usefulness of the proposed method. The empirical performance of the proposed approach is examined through simulation studies. The finite-sample properties of the proposed approach are evaluated in Section 5 via simulation experiments. Section 6 investigates the n Clim Div database to illustrate the practical usefulness of the proposed method. |
| Researcher Affiliation | Academia | 1Department of Statistics, Purdue University 2Department of Mathematical Sciences, Binghamton University. |
| Pseudocode | No | The paper describes the ADMM algorithm and its derivation in Section 3.1, but it does not provide a structured pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | NOAA s n Clim Div database1 were analyzed to demonstrate MTL method s practical usefulness. The monthly average temperature is the response of interest. Available at ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/. |
| Dataset Splits | Yes | data in 1895-2000 was used for training and 2001-2016 for testing. Recall that we did not choose λ to minimize the cross-validation prediction error, but used BIC in order to get a parsimonious model. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments. |
| Software Dependencies | No | The paper mentions statistical software and methods (e.g., ADMM, MCP, SCAD, TLP) and general computational concepts, but it does not provide specific version numbers for any programming languages, libraries, or software packages used for implementation. |
| Experiment Setup | Yes | Table 1 summarizes nine simulation settings (with p = 5 global features and q = 3 heterogeneous features), and each has 100 replications, where the signal-to-noise ratio (SNR) is defined in Section S.10. The largest total sample size is 307,200. For simplicity, we consider equal unit sizes ni n for i = 1, . . . , M. We let the number of units in each subgroup to be (M1, . . . , MS) = 1S + Multinomial(M S, 1S/S). The coordinates of β0 were generated from Uniform( 2, 2) independently. To mimic the different coefficient values for the heterogeneous features between subgroups, we generated α0 = (α 1,0, . . . , α S,0) , where αs,0 = (αs,0,1, αs,0,2, αs,0,3) , in a way to guarantee the minimal signal condition... Moreover, ui and εi follows N(0, 0.3I) and N(0, I), respectively. Finally, Y was generated from the oracle model. |