Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An End-to-End Optimal Trade Execution Framework based on Proximal Policy Optimization

Authors: Siyu Lin, Peter A. Beling

IJCAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results have demonstrated advantages over IS and the shaped reward function in terms of performance and simplicity. The proposed framework has outperformed the industry commonly used baseline models such as TWAP, VWAP, and AC as well as several Deep Reinforcement Learning (DRL) models on most of the 14 US equities in our experiments.
Researcher Affiliation	Academia	Siyu Lin and Peter A. Beling University of Virginia, Charlottesville, VA, USA EMAIL
Pseudocode	No	The paper describes algorithms and architectures in text and diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper links to 'Ray RLlib' (footnote 6: https://ray.readthedocs.io/en/latest/rllib.html), which is a platform used by the authors, but does not provide a link to their specific implementation code for the work described in this paper.
Open Datasets	No	We use the NYSE daily millisecond TAQ data from January 1st, 2018 to December 31st, 2018, downloaded from WRDS. The paper mentions downloading data from WRDS but does not provide a direct public link, DOI, or a citation to a resource that makes the dataset openly available.
Dataset Splits	No	Then, we split the data into training (January-September) and test sets (October-December). The paper mentions tuning hyperparameters on Facebook data, which implies a validation process, but does not specify a distinct validation set split or its size/period.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies	No	In our experiment, we implement a distributed version of the PPO algorithm in Ray RLlib and ﬁne-tune the hyperparameters such as the neural network architecture, learning rate, etc. using Tune, a research platform developed by Liaw et al. [2018]. The paper mentions 'Ray RLlib' and 'Tune' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Table 3: Hyperparameters for PPO LSTM and PPO Stack. Minibatch size 32 32 Sample batch size 5 5 Train batch size 240 240 Discount factor 1 1 Learning rate linearly annealing between 5e-5 and 1e-5 linearly annealing between 5e-5 and 1e-5 KL coeff 0.2 0.2 VF loss coeff 1 1 Entropy coeff 0.01 0.01 Clip param 0.2 0.2 Hidden layers 2 hidden layers with 128 hidden nodes each 2 hidden layers with 128 hidden nodes each Activation functions Relu for hidden layers and linear for output layer Relu for hidden layers and linear for output layer # input/output nodes Input: 22; output: 51 Input: 82; output: 51 Maximum sequence length 12 None LSTM cell size 128 None Stack steps None 4