reproducibilityindex.ai

Robust Market Making via Adversarial Reinforcement Learning

Authors: Thomas Spooner, Rahul Savani

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically compare two conventional single-agent RL agents with ARL, and show that our ARL approach leads to: 1) the emergence of risk-averse behaviour without constraints or domain-speciﬁc penalties; 2) signiﬁcant improvements in performance across a set of standard metrics, evaluated with or without an adversary in the test environment, and; 3) improved robustness to model uncertainty. We empirically demonstrate that our ARL method consistently converges, and we prove for several special cases that the proﬁles that we converge to correspond to Nash equilibria in a simpliﬁed single-stage game.
Researcher Affiliation	Academia	Thomas Spooner and Rahul Savani Department of Computer Science, University of Liverpool {t.spooner, rahul.savani}@liverpool.ac.uk
Pseudocode	No	The paper describes algorithms such as 'NAC-S(λ)' and adaptations of 'RARL', but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Software. All our code is freely accessible on Git Hub: https://github.com/tspooner/rmm.arl.
Open Datasets	No	The paper explicitly states it uses an analytical model rather than a data-driven approach: 'Using an analytical model allows us to examine the characteristics of adversarial training in isolation while minimising systematic error due to bias often present in historical data.' Therefore, no publicly available or open dataset is used.
Dataset Splits	No	The paper describes simulation parameters and training duration (e.g., 'value function was pre-trained for 1000 episodes', 'trained for 10^6 episodes'), but it does not specify traditional dataset splits (e.g., train/validation/test percentages or sample counts) as it utilizes a simulation model to generate data on the fly rather than using a fixed dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments. It only describes the software and training setup.
Software Dependencies	No	The paper mentions using 'NAC-S(λ) algorithm' and 'semi-gradient SARSA(λ)' for policy evaluation, but it does not list any specific software packages, libraries, or solvers with version numbers (e.g., Python 3.8, PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	In each of the experiments to follow, the value function was pre-trained for 1000 episodes (with a learning rate of 10 3) to reduce variance in early policy updates. Both the value function and policy were then trained for 106 episodes, with policy updates every 100 time steps, and a learning rate of 10 4 for both the critic and policy. The value function was conﬁgured to learn λ = 0.97 returns. The starting time was chosen uniformly at random from the interval t0 [0.0, 0.95], with starting price Z0 = 100 and inventory H0 [H = 50, H = 50]. Innovations in Zn occurred with ﬁxed volatility σ = 2 between [t0, 1] with increment t = 0.005.