reproducibilityindex.ai

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

Authors: Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we test the proposed approach in two ﬁnancial environments using real market data. In this section, we show an empirical analysis of the performance of TRVO (Algorithm 1) applied in two ﬁnancial trading tasks:
Researcher Affiliation	Collaboration	1Politecnico di Milano 2ISI Foundation 3Banca IMI
Pseudocode	Yes	Algorithm 1 Trust Region Volatility Optimization (TRVO)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	No	The paper mentions using 'real market data' for 'S&P 500' and 'FX rates USD/EUR and USD/JPY' from specific timeframes, but does not provide concrete access information (link, DOI, specific repository, or formal citation with author/year) to the specific datasets used.
Dataset Splits	No	The paper mentions 'The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018' for the FX environment, implying a temporal train/test split. However, it does not provide specific training/validation/test percentages, absolute sample counts for each split, or references to predefined splits with citations for reproducibility across all experiments.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	The policy we used is a neural network with two hidden layers and 64 neurons per hidden layer. The state consists of the last 10 days of percentage price changes, the previous portfolio position and the fraction of episode left (50 days long). The training has been performed for a total of 5 107 steps on the 2017 dataset, while the testing was applied on 2018.