Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States
Authors: Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Ju Xiao, Bo Li1112-1119
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two real-world datasets, (i) Bitcoin market and (ii) High Tech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines. |
| Researcher Affiliation | Collaboration | Yunan Ye,1 Hengzhi Pei,2 Boxin Wang,3 Pin-Yu Chen,4 Yada Zhu,4 Jun Xiao,1 Bo Li3 1Zhejiang University, 2Fudan University, 3University of Illinois at Urbana-Champaign, 4IBM Research |
| Pseudocode | No | The paper describes the method using equations and textual explanations but does not include structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use the following two datasets from different markets. Bitcoin (Jiang, Xu, and Liang 2017) contains the prices of 10 different cryptocurrencies from 2015-06-30 to 2017-0630. For every cryptocurrency, we have 35089 data points representing prices recorded in a half-hour interval. High Tech (Ding et al. 2014) consists of both daily closing asset price and financial news from 2006-10-20 to 2013-1120. |
| Dataset Splits | No | The paper specifies training and testing splits for both datasets (e.g., 'training (32313 data point) and testing (2776 data point) parts chronologically' for Bitcoin, and '1529 days for training and 255 days for testing chronologically' for High Tech), but does not explicitly mention a separate validation split. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., CPU/GPU models, memory, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper mentions various methods and models like LSTM, HAN, Glove, Word2Vec, Fasttext, Auto Phrase, DPG, PPO, and PG, with citations to their respective papers. However, it does not specify version numbers for any software libraries, frameworks, or packages used for implementation (e.g., TensorFlow 2.x, PyTorch 1.x, scikit-learn 0.x). |
| Experiment Setup | Yes | In Bitcoin dataset, we use the previous prices of the past 30 days to train a classifier for price up/down prediction. We employ a neural network based on LSTM as an encoder and the classifier has 65.08% training and 60.10% testing accuracy. In High Tech dataset, we use the financial news related to stocks for classifier training. We choose Glove as the embedding method and employ a HAN as an encoder to obtain a 100-dimensional embedding vector of stock movement prediction for each news. In our SARL training, we use the prices of past 30 days as standard state s . In High Tech, the average news embeddings of past 12 days are used for state augmentation. We set cb = cs = 0.25% where cb and cs is the constant commission rate of buy and sell. We set Rf = 2% as a typical bank interest value. |