Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Authors: Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both statistical and computational efficiency. [...] Finally, we establish the first lower bound for this problem, justifying the optimality of our results in d and K. |
| Researcher Affiliation | Academia | 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 The University of Tokyo, Chiba, Japan |
| Pseudocode | Yes | Algorithm 1 UCRL-MNL-LL |
| Open Source Code | No | The paper states it is a theoretical paper and does not include experiments. It does not provide any links to open-source code for its methodology. |
| Open Datasets | No | The paper is theoretical and does not report on experiments using datasets. |
| Dataset Splits | No | The paper is theoretical and does not report on experiments using datasets, thus no dataset splits for validation are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |