Handling Learnwares from Heterogeneous Feature Spaces with Explicit Label Exploitation
Authors: Peng Tan, Hai-Tian Liu, Zhi-Hao Tan, Zhi-Hua Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that, even without a model explicitly tailored to user tasks, the system can effectively handle tasks by leveraging models from diverse feature spaces. |
| Researcher Affiliation | Academia | Peng Tan, Hai-Tian Liu, Zhi-Hao Tan, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {tanp,liuht,tanzh,zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | The overall procedure of the heterogeneous learnware dock system consists of two stages. In the submission stage, the dock system receives models with developer-level specifications sketching model capabilities and assigns system-level specifications using a learned unified subspace. In the deployment stage, users submit task requirements detailing marginal and conditional distributions to receive the recommended learnware. This learnware can be integrated with their self-trained models to significantly enhance performance. The detailed procedures for each stage are outlined in Algorithm 1 and 2, respectively. |
| Open Source Code | Yes | The code can be found at https://github.com/LAMDA-TP/Hetero-Learnware-Label-Info. |
| Open Datasets | Yes | Datasets. We tested our methods on 30 datasets from the Tabzilla benchmark [Mc Elfresh et al., 2023], excluding tiny datasets. |
| Dataset Splits | Yes | Each dataset is split into training and test sets with a 4:1 ratio [Mc Elfresh et al., 2023]. The output of regression tasks is scaled to [0,1]. The feature space is randomly divided into four equal blocks. We create four feature spaces for developer tasks from all three-block combinations and six feature spaces for user tasks from all two-block combinations. Our encoder, decoder, and system classifier are two-layer Res Nets [He et al., 2016] for tabular data, with subspace and hidden layer dimensions set to 16 and 32, respectively. We optimize using Adam [Kingma and Ba, 2015]. For user tasks, we sample 100 labeled data points from the training set, using stratified sampling for classification and binning for regression. The coefficients for contrastive, reconstruction and supervised losses are set to 100, 1, and 1, respectively. Developers or users train the model using Light GBM [Ke et al., 2017] with grid search. All experiments are repeated five times. |
| Hardware Specification | Yes | Experiments were conducted using a Tesla A100 80GB GPU, two Intel Xeon Platinum 8358 CPUs with 32 cores each (base clock 2.6 GHz, turbo boost 3.4 GHz), and 512 GB of RAM. |
| Software Dependencies | Yes | Transtab: We employ the official version of Trans Tab v0.0.5, testing it in a contrastive learning setting. |
| Experiment Setup | Yes | Our encoder, decoder, and system classifier are two-layer Res Nets [He et al., 2016] for tabular data, with subspace and hidden layer dimensions set to 16 and 32, respectively. We optimize using Adam [Kingma and Ba, 2015]. The hyper-parameter search space used for the developer and user Light GBM models consists of a list of specific combinations over parameters learning_rate, num_leaves and max_depth: (0.015, 224, 66), (0.005, 300, 50), (0.01, 128, 80), (0.15, 224, 80), and (0.01, 300, 66). |