reproducibilityindex.ai

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Authors: Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang11566-11574

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that transformer-based models pretrained on knowledge base assimilation and other well-established pretraining tasks ﬁne-tuning on our new parallel corpus leads to considerable improvement against expert-layman transfer benchmarks, gaining an average relative improvement of our human evaluation, the Overall Success Rate (OSR), by 106%.
Researcher Affiliation	Academia	University of California, Santa Barbara Department of Computer Science Santa Barbara, California, USA {wendaxu,saxon}@ucsb.edu {sra,william}@cs.ucsb.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/xu1998hz/SSL_KBA_ Expert_Layman_Style_Transfer.
Open Datasets	Yes	We evaluate our proposed method and current SOTA models using the MSD dataset (Cao et al. 2020).
Dataset Splits	No	MSD contains 245k medical training sentences which are each labeled with either the expert or layman style. Additionally, it contains a test set of 675 expert-layman sentence pairs of equivalent meaning. We extend the training set by producing 11,512 sentence pairs using a margin-based criterion (Schwenk 2018). The paper does not specify a validation set split with percentages or counts, although early stopping is mentioned.
Hardware Specification	Yes	Parallel corpus generation took 7.5 hours on a single Titan 1080 Ti GPU. For different SSL task combinations, the pretraining took 6 hours on average and ﬁne-tuning took 1.5 hours on a single Titan 1080 Ti GPU.
Software Dependencies	No	We use the standard training settings for all models with Adam optimizer (Kingma and Ba 2015)... We train a style classiﬁer on the MSD training set using Fast Text (Joulin et al. 2016)... We also use NLTK (Bird, Klein, and Loper 2009) to calculate 4-gram BLEU... We use Ken LM (Heaﬁeld 2011) to train a 5-gram language model... We use clinical-BERT s (Huang, Altosaar, and Ranganath 2020) tokenization for all models. The paper mentions software components but does not provide their specific version numbers.
Experiment Setup	Yes	Max sequence length, learning rate and drop out rate are set to 100, 1e 4 and 0.5 respectively. Our model architecture follows Dai et al. (2019), with 4 layers, 4 attention heads per layer, and hidden size 256. We add one style token into the input sequence with 256 hidden units after the embedding layer. Finally, we augment our expected best condition of KBA + SSL pretraining by making KBA + SSL Large, identical to the other transformer models but for a hidden size of 512.