Secure Out-of-Distribution Task Generalization with Energy-Based Models
Authors: Shengzhuang Chen, Long-Kai Huang, Jonathan Richard Schwarz, Yilun Du, Ying Wei
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on four regression and classification datasets demonstrate the effectiveness of our proposal. In the experiments, we test EBML on both few-shot regression and image classification tasks in search for answers to the following key questions: RQ1: Whether the improved expressiveness of EBML over traditional Bayesian meta-learning methods can lead to a more accurate model of the meta-training ID task distribution, hence a more reliable OOD task detector. RQ2: Whether Energy Sum can be an effective score for detection of OOD meta-testing tasks. RQ3: Whether EBML instantiated with SOTA algorithms can exploit the meta-learned EBM prior in OOD task adaptation to achieve better prediction performance on OOD tasks. |
| Researcher Affiliation | Collaboration | 1City University of Hong Kong 2Tencent AI Lab 3University College London 4Massachusetts Institute of Technology 5Nanyang Technological University |
| Pseudocode | Yes | The complete pseudo codes for meta-training of EBML are available in Appendix E. Pseudo code for the EBML adaptation and inference algorithms described above can be found in Appendix E. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code release of their methodology. |
| Open Datasets | Yes | We use the lbap-general-ic50-size ID/OOD task split in the Drug OOD [21] benchmark... Meta-dataset [49] 5-way 1-shot Classification This experiment considers image classification problems on Meta-dataset [49]. |
| Dataset Splits | No | The paper specifies task domains for meta-training and meta-testing (e.g., '222/145/23 domains by molecular size for ID Train / ID Test / OOD Test' for Drug OOD), and task structures (e.g., 'Each training task consists of 2 to 5 support and 10 query points'). However, it does not explicitly detail a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | For more experimental details, hyper-parameter configurations, and additional experimetal results, please refer to Appendix B and C. (Appendix B.2 'Hyperparameters and Training Details' provides specific values for learning rates, batch sizes, optimizers, and training steps). |