Risk-Averse Trust Region Optimization for Reward-Volatility Reduction
Authors: Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we test the proposed approach in two financial environments using real market data. In this section, we show an empirical analysis of the performance of TRVO (Algorithm 1) applied in two financial trading tasks: |
| Researcher Affiliation | Collaboration | 1Politecnico di Milano 2ISI Foundation 3Banca IMI |
| Pseudocode | Yes | Algorithm 1 Trust Region Volatility Optimization (TRVO) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. |
| Open Datasets | No | The paper mentions using 'real market data' for 'S&P 500' and 'FX rates USD/EUR and USD/JPY' from specific timeframes, but does not provide concrete access information (link, DOI, specific repository, or formal citation with author/year) to the specific datasets used. |
| Dataset Splits | No | The paper mentions 'The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018' for the FX environment, implying a temporal train/test split. However, it does not provide specific training/validation/test percentages, absolute sample counts for each split, or references to predefined splits with citations for reproducibility across all experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | The policy we used is a neural network with two hidden layers and 64 neurons per hidden layer. The state consists of the last 10 days of percentage price changes, the previous portfolio position and the fraction of episode left (50 days long). The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018. |