Reinforcement Learning with Trajectory Feedback
Authors: Yonathan Efroni, Nadav Merlis, Shie Mannor7288-7295
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The paper focuses on extending reinforcement learning algorithms to a new feedback setting, analyzing their regret, and establishing performance guarantees and computational tractability. It includes theorems, lemmas, and pseudocode for algorithms (Algorithms 1, 2, 3), but no empirical evaluation on datasets, performance metrics, or experimental results from actual runs. |
| Researcher Affiliation | Collaboration | Yonathan Efroni 1,2, Nadav Merlis 1 and Shie Mannor1,3 1Technion, Israel Institute of Technology 2Microsoft Research, New York 3Nvidia Research, Israel |
| Pseudocode | Yes | Algorithm 1 OFUL for RL with Trajectory Feedback and Known Model; Algorithm 2 TS for RL with Trajectory Feedback and Known Model; Algorithm 3 UCBVI-TS for RL with Trajectory Feedback |
| Open Source Code | No | The paper does not provide any statements about open-sourcing the code, nor does it include links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on a specific dataset, thus no public dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis; it does not list any software dependencies with specific version numbers required for reproducing empirical results. |
| Experiment Setup | No | The paper describes theoretical algorithms and their analysis (e.g., regret bounds); it does not include details about an empirical experimental setup such as hyperparameters or training configurations. |