reproducibilityindex.ai

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

Authors: Jie Han, Yixiong Zou, Haozhao Wang, Jun Wang, Wei Liu, Yao Wu, Tao Zhang, Ruixuan Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on Snips and Few Joint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1shot setting, and from 46.54% to 60.79% in the 5-shot setting.
Researcher Affiliation	Collaboration	Jie Han1, Yixiong Zou1, Haozhao Wang1, Jun Wang2, Wei Liu1, Yao Wu3, Tao Zhang3, Ruixuan Li1 1School of Computer Science and Technology, Huazhong University of Science and Technology 2i Wudao Tech, 3Banma Network Technology {jiehan,yixiongz,hz wang,idc lw,rxli}@hust.edu.cn, jwang@iwudao.tech, {qifang.wy,billow.zhangt}@alibaba-inc.com
Pseudocode	No	The paper describes its method and modules (I2S-Mask, Masked Slot Decoding) but does not present them in the form of structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about making the source code available or provide a link to a code repository.
Open Datasets	Yes	We conduct extensive experiments on two natural language understanding (NLU) benchmarks: the Snips dataset (Coucke et al. 2018) and the Few Joint dataset (Hou et al. 2020b).
Dataset Splits	Yes	In our experiments, we construct the training domains as source domains to update model parameters, the developing domains to select the best model, the testing domains as target domains to evaluate the performance. For Snips, we construct a training domain with 3 intent classes Play Music, Add To Play List, Book Restaurant, a developing domain with 2 intent classes Rate Book, Search Screening Event, and a testing domain with 2 intent classes Get Weather, Search Creative Work. For Few Joint, we construct 38 training domains, 5 developing domains, and 9 testing domains. Here, each domain has multiple intent labels and slot labels. For the quantity of episodes in each domain, on Snips, we construct 200, 50, 10 episodes for the training, developing and testing domains, respectively.
Hardware Specification	Yes	We conduct experiments of 1-shot on Ge Force GTX 1080, and 5-shot on Ge Force RTX 3090.
Software Dependencies	No	The paper mentions using 'BERT (Devlin et al. 2019) based encoding function' and 'ADAM (Kingma and Ba 2015) to update model parameters', but does not specify version numbers for any software, libraries, or frameworks used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	The batch size is 4 and the learning rate is 10 5. We set a single BERT (Devlin et al. 2019) as the embedding function of intents and slots, where the intent representation of an utterance is the average of all word embedding in the utterance. The weight of the intent score λ in Eq. 7 is 1. We use ADAM (Kingma and Ba 2015) to update model parameters and transfer the learned general semantic representation from source domains to target domains without fine-tuning.