Contrastive Learning and Reward Smoothing for Deep Portfolio Management

Authors: Yun-Hsuan Lien, Yuan-Kui Li, Yu-Shuen Wang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested our method against various traditional financial techniques and other deep RL methods, and found it to be effective in both the U.S. stock market and the cryptocurrency market.
Researcher Affiliation Academia National Yang Ming Chiao Tung University, Taiwan
Pseudocode No The paper describes the policy gradient method and implementation details but does not include any pseudocode blocks or algorithms.
Open Source Code Yes Our source code is available at https://github.com/sophialien/FinTechDPM.
Open Datasets No The data was obtained from the Yahoo Finance Application Programming Interface (API) and the Poloniex s official API, respectively. The paper describes the source of the data but does not provide a direct link, DOI, or formal citation for accessing a publicly available dataset itself.
Dataset Splits No The paper explicitly describes the test set split ('the last year s data were used for testing', 'the last two months of data were used for testing') but does not explicitly mention a separate validation dataset split.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions 'Adam W optimizer' but does not provide specific version numbers for it or other software libraries/frameworks like Python, PyTorch, or CUDA.
Experiment Setup Yes The discount factor γ is set to 1, the length of reward smoothing F (Equation 12) is set to 5, and the temperature τ is set to 0.05 (Equation 6). The learning rate is set to 0.0001/0.00015 and the batch size is set to 300/500 for the U.S. stock/cryptocurrency market, respectively. The value of γ is set to 5e 5 for the cryptocurrency market and 5e 4 for the U.S. stock market. We set the NRI graph to contain 25 nodes, and divide the asset states into 20n groups. The learning rate is set to 0.00015 for the cryptocurrency market and 0.0001 for the U.S. stock market, and the temperature ν is set to 0.5.