Handling Learnwares from Heterogeneous Feature Spaces with Explicit Label Exploitation

Authors: Peng Tan, Hai-Tian Liu, Zhi-Hao Tan, Zhi-Hua Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that, even without a model explicitly tailored to user tasks, the system can effectively handle tasks by leveraging models from diverse feature spaces.
Researcher Affiliation Academia Peng Tan, Hai-Tian Liu, Zhi-Hao Tan, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {tanp,liuht,tanzh,zhouzh}@lamda.nju.edu.cn
Pseudocode Yes The overall procedure of the heterogeneous learnware dock system consists of two stages. In the submission stage, the dock system receives models with developer-level specifications sketching model capabilities and assigns system-level specifications using a learned unified subspace. In the deployment stage, users submit task requirements detailing marginal and conditional distributions to receive the recommended learnware. This learnware can be integrated with their self-trained models to significantly enhance performance. The detailed procedures for each stage are outlined in Algorithm 1 and 2, respectively.
Open Source Code Yes The code can be found at https://github.com/LAMDA-TP/Hetero-Learnware-Label-Info.
Open Datasets Yes Datasets. We tested our methods on 30 datasets from the Tabzilla benchmark [Mc Elfresh et al., 2023], excluding tiny datasets.
Dataset Splits Yes Each dataset is split into training and test sets with a 4:1 ratio [Mc Elfresh et al., 2023]. The output of regression tasks is scaled to [0,1]. The feature space is randomly divided into four equal blocks. We create four feature spaces for developer tasks from all three-block combinations and six feature spaces for user tasks from all two-block combinations. Our encoder, decoder, and system classifier are two-layer Res Nets [He et al., 2016] for tabular data, with subspace and hidden layer dimensions set to 16 and 32, respectively. We optimize using Adam [Kingma and Ba, 2015]. For user tasks, we sample 100 labeled data points from the training set, using stratified sampling for classification and binning for regression. The coefficients for contrastive, reconstruction and supervised losses are set to 100, 1, and 1, respectively. Developers or users train the model using Light GBM [Ke et al., 2017] with grid search. All experiments are repeated five times.
Hardware Specification Yes Experiments were conducted using a Tesla A100 80GB GPU, two Intel Xeon Platinum 8358 CPUs with 32 cores each (base clock 2.6 GHz, turbo boost 3.4 GHz), and 512 GB of RAM.
Software Dependencies Yes Transtab: We employ the official version of Trans Tab v0.0.5, testing it in a contrastive learning setting.
Experiment Setup Yes Our encoder, decoder, and system classifier are two-layer Res Nets [He et al., 2016] for tabular data, with subspace and hidden layer dimensions set to 16 and 32, respectively. We optimize using Adam [Kingma and Ba, 2015]. The hyper-parameter search space used for the developer and user Light GBM models consists of a list of specific combinations over parameters learning_rate, num_leaves and max_depth: (0.015, 224, 66), (0.005, 300, 50), (0.01, 128, 80), (0.15, 224, 80), and (0.01, 300, 66).