Preference-Controlled Multi-Objective Reinforcement Learning for Conditional Text Generation

Authors: Wenqing Chen, Jidong Tian, Caoyun Fan, Yitian Li, Hao He, Yaohui Jin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on paraphrasing and image captioning tasks, which show that in the fidelity-diversity trade-off space, our model outperforms both deterministic and CVAE-based baselines. Experiments We experiment with two tasks including paraphrasing and image captioning.
Researcher Affiliation Academia Wenqing Chen1 , Jidong Tian2, 3 , Caoyun Fan2, 3, Yitian Li2, 3, Hao He2, 3 , Yaohui Jin2, 3 1 School of Software Engineering, Sun Yat-sen University 2 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 3 State Key Lab of Advanced Optical Communication System and Network, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks, nor does it have sections explicitly labeled “Pseudocode” or “Algorithm”.
Open Source Code No The paper does not provide an unambiguous statement about releasing source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets Yes For paraphrasing, we follow recent work (Li et al. 2018; Fu, Feng, and Cunningham 2019; Chen et al. 2020; Zhou and Bhat 2021) and experiment on two commonly used datasets: Quora and MSCOCO. The MSCOCO dataset (Lin et al. 2014) was originally developed for image captioning. We use the Karpathy data split (Karpathy and Li 2015) with 5k images for validation, 5k images for testing, and the rest for training...
Dataset Splits Yes We only use the paraphrase sentences and hold out 3k and 30k validation and test sets respectively. ... We use the Karpathy data split (Karpathy and Li 2015) with 5k images for validation, 5k images for testing, and the rest for training... Task Dataset Train Valid Test Nref Paraphrasing Quora 116,263 3,000 30,000 1 Paraphrasing MSCOCO 78,733 4,050 40,504 4 Captioning MSCOCO 113,287 5,000 5,000 5
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types.
Software Dependencies Yes The used BERT version is roberta-large L17 noidf version=0.3.8(hug trans=4.5.0)
Experiment Setup Yes The hidden sizes of embedding layers, bidirectional LSTM encoders, LSTM decoders, and attention layers are all set to 1,024. All models are optimized with MLE for 70 epochs and RL for another 35 epochs. The learning rate is initialized as 0.0005 with a Noam schedule including 10,000 warmup steps in the MLE period and set to 0.00001 in the RL period. The batch size is set to 40. The default beam size is 3 and the generated text group size is 5. We set the hyperparameter λσ = 2.0 for paraphrasing and λσ = 1.0 for captioning. We set α = 0.5 and β = 1.0 for paraphrasing on Quora, α = 1.0 and β = 1.0 for other datasets.