Probing Product Description Generation via Posterior Distillation
Authors: Haolan Zhan, Hainan Zhang, Hongshen Chen, Lei Shen, Zhuoye Ding, Yongjun Bao, Weipeng Yan, Yanyan Lan14301-14309
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our model is superior to traditional generative models in both automatic indicators and human evaluation. |
| Researcher Affiliation | Collaboration | 1Institute of Software, Chinese Academy of Sciences, Beijing, China 2Data Science Lab, JD.com, Beijing, China 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 4University of Chiense Academy of Sciences, Beijing, China |
| Pseudocode | No | The paper describes the model architecture and components textually and with a diagram, but it does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper mentions 'https://github.com/Open NMT/Open NMT-py' which is a third-party framework used, and 'https://github.com/jddsl/JDPDG' which is for the dataset. There is no explicit statement or link providing the source code for their proposed model. |
| Open Datasets | Yes | We collect a large-scale Chinese product description generation dataset, named as JDPDG from JD.com1, one of the biggest e-commerce platforms in China. This dataset contains 345,799 pairs of item content and description. https://github.com/jddsl/JDPDG |
| Dataset Splits | Yes | Table 2: Data statistics for our proposed JDPDG dataset. Category Shoes&Clothes Digital Homing Training Pairs 135,941 100,236 85,622 Validation Pairs 4000 4000 4000 Test Pairs 4000 4000 4000 |
| Hardware Specification | Yes | We implement our model in Open NMT3 and train all models on the Tesla P40 GPUs with Pytorch (Paszke et al. 2019). |
| Software Dependencies | No | The paper mentions 'Open NMT' and 'Pytorch' as software used for implementation, but it does not specify any version numbers for these software components. |
| Experiment Setup | Yes | For experimental models, the hidden units of all transformer-based models are set as 512 and the feed-forward hidden size is set as 1,024. The beam search size is set as 5 and length penalty as α = 0.4 (Wu et al. 2016). For LSTM-based models, the word dimension is set to 300 and the hidden nodes are set as 256 for the encoder and decoder. The dropout rate and smoothing factor are set as 0.1 (Fabbri et al. 2019). The initial learning rate is set to 0.001. The β1 = 0.9 and β2 = 0.998 are used for gradient optimization. We also apply warm-up trick over the first 8, 000 steps, and decay as in Vaswani et al. (2017). For hyper-parameters, we set γ1, β and α to 0.5, 0.4 and 0.5, respectively. |