reproducibilityindex.ai

Preference-Controlled Multi-Objective Reinforcement Learning for Conditional Text Generation

Authors: Wenqing Chen, Jidong Tian, Caoyun Fan, Yitian Li, Hao He, Yaohui Jin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on paraphrasing and image captioning tasks, which show that in the ﬁdelity-diversity trade-off space, our model outperforms both deterministic and CVAE-based baselines. Experiments We experiment with two tasks including paraphrasing and image captioning.
Researcher Affiliation	Academia	Wenqing Chen1 , Jidong Tian2, 3 , Caoyun Fan2, 3, Yitian Li2, 3, Hao He2, 3 , Yaohui Jin2, 3 1 School of Software Engineering, Sun Yat-sen University 2 Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University 3 State Key Lab of Advanced Optical Communication System and Network, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks, nor does it have sections explicitly labeled “Pseudocode” or “Algorithm”.
Open Source Code	No	The paper does not provide an unambiguous statement about releasing source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets	Yes	For paraphrasing, we follow recent work (Li et al. 2018; Fu, Feng, and Cunningham 2019; Chen et al. 2020; Zhou and Bhat 2021) and experiment on two commonly used datasets: Quora and MSCOCO. The MSCOCO dataset (Lin et al. 2014) was originally developed for image captioning. We use the Karpathy data split (Karpathy and Li 2015) with 5k images for validation, 5k images for testing, and the rest for training...
Dataset Splits	Yes	We only use the paraphrase sentences and hold out 3k and 30k validation and test sets respectively. ... We use the Karpathy data split (Karpathy and Li 2015) with 5k images for validation, 5k images for testing, and the rest for training... Task Dataset Train Valid Test Nref Paraphrasing Quora 116,263 3,000 30,000 1 Paraphrasing MSCOCO 78,733 4,050 40,504 4 Captioning MSCOCO 113,287 5,000 5,000 5
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU or CPU models, memory specifications, or cloud computing instance types.
Software Dependencies	Yes	The used BERT version is roberta-large L17 noidf version=0.3.8(hug trans=4.5.0)
Experiment Setup	Yes	The hidden sizes of embedding layers, bidirectional LSTM encoders, LSTM decoders, and attention layers are all set to 1,024. All models are optimized with MLE for 70 epochs and RL for another 35 epochs. The learning rate is initialized as 0.0005 with a Noam schedule including 10,000 warmup steps in the MLE period and set to 0.00001 in the RL period. The batch size is set to 40. The default beam size is 3 and the generated text group size is 5. We set the hyperparameter λσ = 2.0 for paraphrasing and λσ = 1.0 for captioning. We set α = 0.5 and β = 1.0 for paraphrasing on Quora, α = 1.0 and β = 1.0 for other datasets.