Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

Authors: Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we test the proposed approach in two financial environments using real market data. In this section, we show an empirical analysis of the performance of TRVO (Algorithm 1) applied in two financial trading tasks:
Researcher Affiliation Collaboration 1Politecnico di Milano 2ISI Foundation 3Banca IMI
Pseudocode Yes Algorithm 1 Trust Region Volatility Optimization (TRVO)
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets No The paper mentions using 'real market data' for 'S&P 500' and 'FX rates USD/EUR and USD/JPY' from specific timeframes, but does not provide concrete access information (link, DOI, specific repository, or formal citation with author/year) to the specific datasets used.
Dataset Splits No The paper mentions 'The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018' for the FX environment, implying a temporal train/test split. However, it does not provide specific training/validation/test percentages, absolute sample counts for each split, or references to predefined splits with citations for reproducibility across all experiments.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes The policy we used is a neural network with two hidden layers and 64 neurons per hidden layer. The state consists of the last 10 days of percentage price changes, the previous portfolio position and the fraction of episode left (50 days long). The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018.