An End-to-End Optimal Trade Execution Framework based on Proximal Policy Optimization

Authors: Siyu Lin, Peter A. Beling

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results have demonstrated advantages over IS and the shaped reward function in terms of performance and simplicity. The proposed framework has outperformed the industry commonly used baseline models such as TWAP, VWAP, and AC as well as several Deep Reinforcement Learning (DRL) models on most of the 14 US equities in our experiments.
Researcher Affiliation Academia Siyu Lin and Peter A. Beling University of Virginia, Charlottesville, VA, USA {sl5tb, pb3a}@virginia.edu
Pseudocode No The paper describes algorithms and architectures in text and diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper links to 'Ray RLlib' (footnote 6: https://ray.readthedocs.io/en/latest/rllib.html), which is a platform used by the authors, but does not provide a link to their specific implementation code for the work described in this paper.
Open Datasets No We use the NYSE daily millisecond TAQ data from January 1st, 2018 to December 31st, 2018, downloaded from WRDS. The paper mentions downloading data from WRDS but does not provide a direct public link, DOI, or a citation to a resource that makes the dataset openly available.
Dataset Splits No Then, we split the data into training (January-September) and test sets (October-December). The paper mentions tuning hyperparameters on Facebook data, which implies a validation process, but does not specify a distinct validation set split or its size/period.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies No In our experiment, we implement a distributed version of the PPO algorithm in Ray RLlib and fine-tune the hyperparameters such as the neural network architecture, learning rate, etc. using Tune, a research platform developed by Liaw et al. [2018]. The paper mentions 'Ray RLlib' and 'Tune' but does not provide specific version numbers for these software components.
Experiment Setup Yes Table 3: Hyperparameters for PPO LSTM and PPO Stack. Minibatch size 32 32 Sample batch size 5 5 Train batch size 240 240 Discount factor 1 1 Learning rate linearly annealing between 5e-5 and 1e-5 linearly annealing between 5e-5 and 1e-5 KL coeff 0.2 0.2 VF loss coeff 1 1 Entropy coeff 0.01 0.01 Clip param 0.2 0.2 Hidden layers 2 hidden layers with 128 hidden nodes each 2 hidden layers with 128 hidden nodes each Activation functions Relu for hidden layers and linear for output layer Relu for hidden layers and linear for output layer # input/output nodes Input: 22; output: 51 Input: 82; output: 51 Maximum sequence length 12 None LSTM cell size 128 None Stack steps None 4