No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Authors: Tiancheng Jin, Junyan Liu, ChloƩ Rouyer, William Chang, Chen-Yu Wei, Haipeng Luo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys e O( T +CP) regret where CP measures how adversarial the transition functions are and can be at most O(T). While this algorithm itself requires knowledge of CP, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves e O(U + UCL+CP) regret, where U is some standard gap-dependent coefficient and CL is the amount of corruption on losses. |
| Researcher Affiliation | Academia | Tiancheng Jin University of Southern California tiancheng.jin@usc.edu Junyan Liu University of California, San Diego jul037@ucsd.edu ChloƩ Rouyer University of Copenhagen chloe@di.ku.dk William Chang University of California, Los Angeles chang314@g.ucla.edu Chen-Yu Wei MIT Institute for Data, Systems, and Society chenyuw@mit.edu Haipeng Luo University of Southern California haipengl@usc.edu |
| Pseudocode | Yes | Algorithm 1 Algorithm for Adversarial Transitions (with Known CP); Algorithm 2 STable Algorithm By Independent Learners and Instance SElection (STABILISE); Algorithm 3 (A Variant of) Corral; Algorithm 4 Algorithm with Optimistic Transition Achieving Gap-Dependent Bounds (Known CP) |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | This is a theoretical research paper focused on algorithm development and regret bounds. It does not involve empirical evaluation with datasets, so no training dataset information is provided. |
| Dataset Splits | No | This is a theoretical research paper that does not perform empirical experiments on datasets. Therefore, no information about dataset splits (training, validation, test) is provided. |
| Hardware Specification | No | This is a theoretical research paper and does not describe any computational experiments that would require hardware specifications. |
| Software Dependencies | No | This is a theoretical research paper. It does not mention any specific software dependencies with version numbers, as no experiments are described. |
| Experiment Setup | No | This is a theoretical research paper focused on algorithm design and theoretical analysis. It does not include any experimental setup details such as hyperparameters or training configurations. |