Towards Generalizable Reinforcement Learning for Trade Execution
Authors: Chuheng Zhang, Yitong Duan, Xiaoyu Chen, Jianyu Chen, Jian Li, Li Zhao
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2IIIS, Tsinghua University |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 2) but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the source code in https://github.com/zhangchuheng123/RL4Execution. |
| Open Datasets | No | Our simulator is based on the LOB data of 100 most liquid stocks in China A-share market. The data collected from April 2022 to June 2022 is used as the training set, and the data collected during July 2022 and August 2022 are used as the validation and testing set respectively. No specific link, DOI, or citation for public access to this dataset is provided, indicating it was collected by the authors. |
| Dataset Splits | Yes | The data collected from April 2022 to June 2022 is used as the training set, and the data collected during July 2022 and August 2022 are used as the validation and testing set respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using DDPG, DQN, and PPO as base RL algorithms, but does not provide specific version numbers for these libraries or any other software dependencies. |
| Experiment Setup | Yes | In the simplified task, 'we use DDPG [Silver et al., 2014] as the base RL algorithm'. For the high-fidelity simulation, the 'task is to sell 0.5% of the total trading volume... in a 30-minute period randomly selected from a trading day. The agent makes a decision... at the start of each minute.' The paper also describes the loss functions for CASH and CATE. 'The loss is a mean-squared error between the context representation and the generated statistics.' For CATE, the loss function L(θ, ϑ, w) is provided. |