Semi-Offline Reinforcement Learning for Optimized Text Generation

Authors: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.
Researcher Affiliation Collaboration 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2The work was done during the author s internship at Microsoft Research Asia. 3Microsoft Research Asia, Beijing, China 4Georgia Institute of Technology, Atlanta, USA 5Microsoft, Redmond, USA. Correspondence to: Xiting Wang <xitwan@microsoft.com>, Rui Yan <ruiyan@ruc.edu.cn>.
Pseudocode No The paper does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Our code is available at https://github.com/Changyu Chen347/semioffline-RL.
Open Datasets Yes We conduct experiments on 1) a summarization dataset CNN/DM (Hermann et al., 2015); 2) a dialogue summarization dataset SAMSum (Gliwa et al., 2019); 3) a natural question generation dataset SQu AD (Rajpurkar et al., 2016); 4) an extreme summarization dataset XSum (Narayan et al., 2018)
Dataset Splits Yes Table 9. Statistical information on the datasets. CNN/DM: # TRAIN 287, 113 # DEV 13, 368 # TEST 11, 490
Hardware Specification Yes The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods.
Software Dependencies No The paper mentions base models like BART and T5, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods. Table 8 provides further details: BATCH SIZE 16, LEARNING RATE 1E-6 to 3E-6, λ(WEIGHT OF LRL) 20 to 2, # SAMPLE 64 to 16, pm(MASK RATE) 0.4.