Making RL with Preference-based Feedback Efficient via Randomization
Authors: Runzhe Wu, Wen Sun
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Overall, while our main contribution is on the theoretical side, our theoretical investigation provides several new practical insights. |
| Researcher Affiliation | Academia | Runzhe Wu Department of Computer Science Cornell University rw646@cornell.edu Wen Sun Department of Computer Science Cornell University ws455@cornell.edu |
| Pseudocode | Yes | Algorithm 1 Preference-based and Randomized Least-Squares Value Iteration (PR-LSVI) |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes theoretical algorithms and does not perform experiments on specific datasets. Therefore, no information about publicly available or open datasets for training is provided. |
| Dataset Splits | No | This is a theoretical paper and does not describe experiments with dataset splits. No specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, or testing is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe the execution of experiments requiring specific hardware. Therefore, no hardware specifications (e.g., GPU/CPU models, memory details) are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis rather than practical implementation details. No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are mentioned. |
| Experiment Setup | No | The paper is theoretical and defines algorithm parameters (e.g., sigma_r, sigma_P, epsilon) as part of the theoretical framework rather than specific experimental setup details for empirical evaluation. No section titled 'Experimental Setup' or similar detailing training configurations or system-level settings for experiments is present. |