Probing Product Description Generation via Posterior Distillation

Authors: Haolan Zhan, Hainan Zhang, Hongshen Chen, Lei Shen, Zhuoye Ding, Yongjun Bao, Weipeng Yan, Yanyan Lan14301-14309

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our model is superior to traditional generative models in both automatic indicators and human evaluation.
Researcher Affiliation Collaboration 1Institute of Software, Chinese Academy of Sciences, Beijing, China 2Data Science Lab, JD.com, Beijing, China 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 4University of Chiense Academy of Sciences, Beijing, China
Pseudocode No The paper describes the model architecture and components textually and with a diagram, but it does not include pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper mentions 'https://github.com/Open NMT/Open NMT-py' which is a third-party framework used, and 'https://github.com/jddsl/JDPDG' which is for the dataset. There is no explicit statement or link providing the source code for their proposed model.
Open Datasets Yes We collect a large-scale Chinese product description generation dataset, named as JDPDG from JD.com1, one of the biggest e-commerce platforms in China. This dataset contains 345,799 pairs of item content and description. https://github.com/jddsl/JDPDG
Dataset Splits Yes Table 2: Data statistics for our proposed JDPDG dataset. Category Shoes&Clothes Digital Homing Training Pairs 135,941 100,236 85,622 Validation Pairs 4000 4000 4000 Test Pairs 4000 4000 4000
Hardware Specification Yes We implement our model in Open NMT3 and train all models on the Tesla P40 GPUs with Pytorch (Paszke et al. 2019).
Software Dependencies No The paper mentions 'Open NMT' and 'Pytorch' as software used for implementation, but it does not specify any version numbers for these software components.
Experiment Setup Yes For experimental models, the hidden units of all transformer-based models are set as 512 and the feed-forward hidden size is set as 1,024. The beam search size is set as 5 and length penalty as α = 0.4 (Wu et al. 2016). For LSTM-based models, the word dimension is set to 300 and the hidden nodes are set as 256 for the encoder and decoder. The dropout rate and smoothing factor are set as 0.1 (Fabbri et al. 2019). The initial learning rate is set to 0.001. The β1 = 0.9 and β2 = 0.998 are used for gradient optimization. We also apply warm-up trick over the first 8, 000 steps, and decay as in Vaswani et al. (2017). For hyper-parameters, we set γ1, β and α to 0.5, 0.4 and 0.5, respectively.