A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
Authors: Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, Liwei Wang, Simon S. Du
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm that can handle stochastic delays in sequential decision-making problems. By plugging different multi-batched algorithms into our framework, we provide several examples demonstrating that our framework not only matches or improves existing results for bandits, tabular MDPs, and tabular MGs, but also provides the first line of studies on delays in sequential decision making with function approximation. In summary, we provide a complete set of sharp results for single-agent and multi-agent sequential decision-making problems with delayed feedback. |
| Researcher Affiliation | Collaboration | 1Center for Data Science, Peking University 2University of California, Berkeley 3Zhejiang Lab 4National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 5University of Washington |
| Pseudocode | Yes | Algorithm 1 Protocol of Multi-batched Algorithm; Algorithm 2 Multi-batched Algorithm With Delayed Feedback; Algorithm 3 Phase Elimination; Algorithm 4 Multi-batched Algorithm for Tabular Markov Game; Algorithm 5 Multi-batched Algorithm for Linear Markov Games; Algorithm 6 Multi-batched V-learning |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | No | This is a theoretical paper focused on a framework and regret bounds. It does not conduct empirical studies with datasets, thus it does not mention specific training dataset access. |
| Dataset Splits | No | This is a theoretical paper focused on a framework and regret bounds. It does not conduct empirical studies with datasets, thus it does not discuss training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, therefore no hardware specifications for running experiments are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and theoretical bounds. It does not describe an empirical experimental setup, hyperparameters, or training configurations. |