Few-shot Learning for Multi-label Intent Detection

Authors: Yutai Hou, Yongkui Lai, Yushan Wu, Wanxiang Che, Ting Liu13036-13044

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both oneshot and five-shot settings.
Researcher Affiliation Academia Yutai Hou, Yongkui Lai, Yushan Wu, Wanxiang Che* , Ting Liu School of Computer Science and Technology, Harbin Institute of Technology, China {ythou, yklai, car, tliu}@ir.hit.edu.cn, wuyushan@hit.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Data and code are available at https://github.com/Atma Hou/Few Shot Multi Label.
Open Datasets Yes We conduct experiments on public dataset Tour SG (Williams et al. 2012) and introduce a new multi-intent dataset Stanford LU. These two datasets contain multiple domains and thus allow to simulate the few-shot situation on unseen domains. Tour SG (DSTC-4) contains 25,751 utterances annotated with multiple dialogue acts and 6 separated domains... Stanford LU is an re-annotated version of Stanford dialogue dataset (Eric et al. 2017) containing 8,038 user utterances from 3 domains: Schedule (Sc), Navigate (Na), Weather (We).
Dataset Splits Yes Each time, we pick one target domain for testing, one domain for development, and use the rest domains of the same dataset as source domains for training. For example of the Tour SG dataset, each round, model is trained on 4 * 100 * 16 = 6400 samples, validated on 1 * 50 * 16 = 1600 samples, and tested on 1 * 50 * 16 = 800 samples.
Hardware Specification No The paper mentions using Electra-small (14M params) and BERT-base (110M params) as embedders but does not provide specific details about the hardware (e.g., GPU, CPU models) used for experiments.
Software Dependencies No The paper mentions using BERT, Electra-small, and ADAM optimizer but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python version, PyTorch version).
Experiment Setup Yes We use ADAM (Kingma and Ba 2015) to train the models with batch size 4. Learning rate is set as 1e-5 for both our model and baseline models. We set α (Eq. 1) as 0.3 and vary β (Eq. 2) in {0.1, 0.5, 0.9} considering label name s anchoring power with different datasets and support-set sizes. For the MLP of kernel regression, we employ Re LU as activation function and vary the layers in {1, 2, 3} and hidden dimension in {5, 10, 20}. The best hyperparameter are determined on the development domains.