Semi-Offline Reinforcement Learning for Optimized Text Generation
Authors: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2The work was done during the author s internship at Microsoft Research Asia. 3Microsoft Research Asia, Beijing, China 4Georgia Institute of Technology, Atlanta, USA 5Microsoft, Redmond, USA. Correspondence to: Xiting Wang <xitwan@microsoft.com>, Rui Yan <ruiyan@ruc.edu.cn>. |
| Pseudocode | No | The paper does not include a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | Our code is available at https://github.com/Changyu Chen347/semioffline-RL. |
| Open Datasets | Yes | We conduct experiments on 1) a summarization dataset CNN/DM (Hermann et al., 2015); 2) a dialogue summarization dataset SAMSum (Gliwa et al., 2019); 3) a natural question generation dataset SQu AD (Rajpurkar et al., 2016); 4) an extreme summarization dataset XSum (Narayan et al., 2018) |
| Dataset Splits | Yes | Table 9. Statistical information on the datasets. CNN/DM: # TRAIN 287, 113 # DEV 13, 368 # TEST 11, 490 |
| Hardware Specification | Yes | The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods. |
| Software Dependencies | No | The paper mentions base models like BART and T5, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The experiments are run on a machine with an Nvidia A40 GPU (memory: 48 GB) using a learning rate of 1e-6 and a batch size of 8 for all compared methods. Table 8 provides further details: BATCH SIZE 16, LEARNING RATE 1E-6 to 3E-6, λ(WEIGHT OF LRL) 20 to 2, # SAMPLE 64 to 16, pm(MASK RATE) 0.4. |