Toward Diverse Text Generation with Inverse Reinforcement Learning
Authors: Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results demonstrate that our proposed method can generate higher quality texts than the previous methods. |
| Researcher Affiliation | Academia | Zhan Shi, Xinchi Chen, Xipeng Qiu , Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University |
| Pseudocode | Yes | Algorithm 1 IRL for Text Generation |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the proposed Inverse Reinforcement Learning (IRL) method. It only links to the code for baseline models (Seq GAN and Leak GAN). |
| Open Datasets | Yes | We experiment on three corpora: the synthetic oracle dataset [Yu et al., 2017], the COCO image caption dataset [Chen et al., 2015] and the IMDB movie review dataset [Diao et al., 2014]. |
| Dataset Splits | No | The paper explicitly states splits for training and testing sets (e.g., '80,000 texts as training set, and another 5,000 as test set' for COCO), but does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions the Adam optimizer but does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | Table 1 gives the experimental settings on the three corpora. It includes 'Embedding dimension 32 64 128', 'Hidden layer dimension 32 64 128', 'Batch size 64 128', 'Optimizer & lr rate Adam, 0.005' for Text Generator and 'Drop out 0.75 0.45 0.75', 'Batch size 64 1024', 'Optimizer & lr rate Adam, 0.0004' for Reward Approximator. |