Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
Authors: Taehyun Hwang, Min-hwan Oh
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance. Numerical Experiments In this section, we evaluate the performances of our proposed algorithm, UCRL-MNL in numerical experiments. |
| Researcher Affiliation | Academia | Taehyun Hwang, Min-hwan Oh* Seoul National University, Seoul, Republic of Korea EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Upper Confidence Model-based RL for MNL Transition Model (UCRL-MNL) |
| Open Source Code | No | The paper does not contain an unambiguous statement of releasing the code for the described methodology or a direct link to a source-code repository. |
| Open Datasets | No | The paper uses the 'River Swim environment' but does not provide a specific link, DOI, or repository for a publicly available dataset, nor does it provide a formal citation with author/year for a dataset itself, only for the description of the environment. |
| Dataset Splits | No | The paper does not specify exact split percentages, absolute sample counts, or reference predefined splits with citations for training, validation, or test sets. It mentions '10 independent runs' but not data partitioning for model validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper does not mention specific software components with version numbers required for replication. |
| Experiment Setup | No | The paper states 'To set the hyperparameters for each algorithm, we performed a grid search over certain ranges' but does not provide the specific hyperparameter values, training configurations, or system-level settings used. |