Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OPHR: Mastering Volatility Trading with Multi-Agent Deep Reinforcement Learning

Authors: Zeting Chen, Xinyu Cai, Molei Qin, Bo An

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluating our approach using cryptocurrency options data from 2021-2024, we demonstrate superior performance on BTC and ETH, significantly outperforming traditional strategies and machine learning baselines across all profit and risk-adjusted metrics while exhibiting sophisticated trading behavior.
Researcher Affiliation Collaboration Zeting Chen Nanyang Technological University Singapore EMAIL Xinyu Cai Nanyang Technological University Singapore EMAIL Molei Qin Nanyang Technological University Singapore EMAIL Bo An Nanyang Technological University Skywork AI Singapore EMAIL
Pseudocode Yes The algorithms for online learning are presented in Algorithm 1. Algorithm 1: OP-Agent Online Training via nop-step TD Error. The training algorithm of the HR-Agent is demonstrated in Algorithm 2. Algorithm 2: HR-Agent Online Training via 1-step TD Error. The detailed implementation is presented in Algorithm 3 in Appendix B.3. Algorithm 3: OPHR Joint Training
Open Source Code Yes The code framework and sample data of this paper have been released on https://github.com/Edwicn/OPHRMastering Volatility Tradingwith Multi Agent Deep Reinforcement Learning
Open Datasets Yes The code framework and sample data of this paper have been released on https://github.com/Edwicn/OPHRMastering Volatility Tradingwith Multi Agent Deep Reinforcement Learning
Dataset Splits Yes To comprehensively evaluate the proposed algorithm, we conduct experiments on BTC and ETH options data obtained from Deribit. The dataset splitting is shown in Table 1. This period covers diverse market conditions, including bull markets (e.g., 2019 and 2021), bear markets (e.g., 2022), and periods of elevated volatility (e.g., the 2020 COVID-19 pandemic and the 2022 crypto market crash). We utilize hourly-level data to capture intraday price dynamics and volatility structures critical to volatility trading. The dataset includes complete options chains across a wide range of strikes and expirations (weekly to quarterly), as well as comprehensive market indicators such as implied volatility surfaces, open interest, and trading volume. Table 1: Dataset Splits for Experiments Dataset Train Validation Test BTCUSD 19/04/01 22/12/31 23/01/01 23/06/30 23/07/01 24/07/01 ETHUSD 19/04/01 22/12/31 23/01/01 23/06/30 23/07/01 24/07/01
Hardware Specification Yes We conducted all experiments on a server equipped with 4 NVIDIA RTX 4090 GPUs and an AMD Ryzen Threadripper PRO 5995WX CPU.
Software Dependencies No The paper describes algorithms (DQN, Actor-Critic) and models (GBDT, MLP, LSTM) but does not list specific versions of libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn) or programming languages used.
Experiment Setup Yes In Section 4.1, we introduce the N-step temporal-difference Double DQN algorithm applied within a rolling training framework. The main hyperparameter settings for this algorithm are summarized in Table 5. In our experiments, we adopt a rolling training of every 10 days and apply the proposed algorithm to historical data, and the network is updated using 12-step temporal-difference (TD) learning. Table 5: Hyperparameters for OP-Agent Training. Table 6: Hedger parameters. Table 7: Hyperparameters for HR-Agent Training.