No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Authors: Tiancheng Jin, Junyan Liu, ChloƩ Rouyer, William Chang, Chen-Yu Wei, Haipeng Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys e O( T +CP) regret where CP measures how adversarial the transition functions are and can be at most O(T). While this algorithm itself requires knowledge of CP, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves e O(U + UCL+CP) regret, where U is some standard gap-dependent coefficient and CL is the amount of corruption on losses.
Researcher Affiliation Academia Tiancheng Jin University of Southern California tiancheng.jin@usc.edu Junyan Liu University of California, San Diego jul037@ucsd.edu ChloƩ Rouyer University of Copenhagen chloe@di.ku.dk William Chang University of California, Los Angeles chang314@g.ucla.edu Chen-Yu Wei MIT Institute for Data, Systems, and Society chenyuw@mit.edu Haipeng Luo University of Southern California haipengl@usc.edu
Pseudocode Yes Algorithm 1 Algorithm for Adversarial Transitions (with Known CP); Algorithm 2 STable Algorithm By Independent Learners and Instance SElection (STABILISE); Algorithm 3 (A Variant of) Corral; Algorithm 4 Algorithm with Optimistic Transition Achieving Gap-Dependent Bounds (Known CP)
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No This is a theoretical research paper focused on algorithm development and regret bounds. It does not involve empirical evaluation with datasets, so no training dataset information is provided.
Dataset Splits No This is a theoretical research paper that does not perform empirical experiments on datasets. Therefore, no information about dataset splits (training, validation, test) is provided.
Hardware Specification No This is a theoretical research paper and does not describe any computational experiments that would require hardware specifications.
Software Dependencies No This is a theoretical research paper. It does not mention any specific software dependencies with version numbers, as no experiments are described.
Experiment Setup No This is a theoretical research paper focused on algorithm design and theoretical analysis. It does not include any experimental setup details such as hyperparameters or training configurations.