reproducibilityindex.ai

Semi-Offline Reinforcement Learning for Optimized Text Generation

Authors: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.
Researcher Affiliation	Collaboration	1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2The work was done during the author s internship at Microsoft Research Asia. 3Microsoft Research Asia, Beijing, China 4Georgia Institute of Technology, Atlanta, USA 5Microsoft, Redmond, USA. Correspondence to: Xiting Wang <xitwan@microsoft.com>, Rui Yan <ruiyan@ruc.edu.cn>.
Pseudocode	No	The paper does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code	Yes	Our code is available at https://github.com/Changyu Chen347/semioffline-RL.
Open Datasets	Yes	We conduct experiments on 1) a summarization dataset CNN/DM (Hermann et al., 2015); 2) a dialogue summarization dataset SAMSum (Gliwa et al., 2019); 3) a natural question generation dataset SQu AD (Rajpurkar et al., 2016); 4) an extreme summarization dataset XSum (Narayan et al., 2018)
Dataset Splits	Yes	Table 9. Statistical information on the datasets. CNN/DM: # TRAIN 287, 113 # DEV 13, 368 # TEST 11, 490
Hardware Specification	Yes	The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods.
Software Dependencies	No	The paper mentions base models like BART and T5, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods. Table 8 provides further details: BATCH SIZE 16, LEARNING RATE 1E-6 to 3E-6, λ(WEIGHT OF LRL) 20 to 2, # SAMPLE 64 to 16, pm(MASK RATE) 0.4.