Few-shot Learning for Multi-label Intent Detection
Authors: Yutai Hou, Yongkui Lai, Yushan Wu, Wanxiang Che, Ting Liu13036-13044
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both oneshot and five-shot settings. |
| Researcher Affiliation | Academia | Yutai Hou, Yongkui Lai, Yushan Wu, Wanxiang Che* , Ting Liu School of Computer Science and Technology, Harbin Institute of Technology, China {ythou, yklai, car, tliu}@ir.hit.edu.cn, wuyushan@hit.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Data and code are available at https://github.com/Atma Hou/Few Shot Multi Label. |
| Open Datasets | Yes | We conduct experiments on public dataset Tour SG (Williams et al. 2012) and introduce a new multi-intent dataset Stanford LU. These two datasets contain multiple domains and thus allow to simulate the few-shot situation on unseen domains. Tour SG (DSTC-4) contains 25,751 utterances annotated with multiple dialogue acts and 6 separated domains... Stanford LU is an re-annotated version of Stanford dialogue dataset (Eric et al. 2017) containing 8,038 user utterances from 3 domains: Schedule (Sc), Navigate (Na), Weather (We). |
| Dataset Splits | Yes | Each time, we pick one target domain for testing, one domain for development, and use the rest domains of the same dataset as source domains for training. For example of the Tour SG dataset, each round, model is trained on 4 * 100 * 16 = 6400 samples, validated on 1 * 50 * 16 = 1600 samples, and tested on 1 * 50 * 16 = 800 samples. |
| Hardware Specification | No | The paper mentions using Electra-small (14M params) and BERT-base (110M params) as embedders but does not provide specific details about the hardware (e.g., GPU, CPU models) used for experiments. |
| Software Dependencies | No | The paper mentions using BERT, Electra-small, and ADAM optimizer but does not specify software versions for programming languages, libraries, or frameworks (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | We use ADAM (Kingma and Ba 2015) to train the models with batch size 4. Learning rate is set as 1e-5 for both our model and baseline models. We set α (Eq. 1) as 0.3 and vary β (Eq. 2) in {0.1, 0.5, 0.9} considering label name s anchoring power with different datasets and support-set sizes. For the MLP of kernel regression, we employ Re LU as activation function and vary the layers in {1, 2, 3} and hidden dimension in {5, 10, 20}. The best hyperparameter are determined on the development domains. |