Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Authors: Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both statistical and computational efficiency. [...] Finally, we establish the first lower bound for this problem, justifying the optimality of our results in d and K. |
| Researcher Affiliation | Academia | 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 The University of Tokyo, Chiba, Japan |
| Pseudocode | Yes | Algorithm 1 UCRL-MNL-LL |
| Open Source Code | No | The paper states it is a theoretical paper and does not include experiments. It does not provide any links to open-source code for its methodology. |
| Open Datasets | No | The paper is theoretical and does not report on experiments using datasets. |
| Dataset Splits | No | The paper is theoretical and does not report on experiments using datasets, thus no dataset splits for validation are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |