Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

Authors: Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both statistical and computational efficiency. [...] Finally, we establish the first lower bound for this problem, justifying the optimality of our results in d and K.
Researcher Affiliation Academia 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 The University of Tokyo, Chiba, Japan
Pseudocode Yes Algorithm 1 UCRL-MNL-LL
Open Source Code No The paper states it is a theoretical paper and does not include experiments. It does not provide any links to open-source code for its methodology.
Open Datasets No The paper is theoretical and does not report on experiments using datasets.
Dataset Splits No The paper is theoretical and does not report on experiments using datasets, thus no dataset splits for validation are provided.
Hardware Specification No The paper is theoretical and does not describe any experimental hardware.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.