Provable Interactive Learning with Hindsight Instruction Feedback
Authors: Dipendra Misra, Aldo Pacchiano, Robert E. Schapire
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experiments showing the performance of LORIL in practice for 2 domains. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2Broad Institute of MIT and Harvard, Boston University. |
| Pseudocode | Yes | Algorithm 1 LORIL(g , F): Learning in LOw-Rank models from Instruction Labels |
| Open Source Code | Yes | The code for all experiments in the paper can be found at https://github. com/microsoft/Intrepid. |
| Open Datasets | No | The paper mentions evaluating on a 'synthetic task' and an 'image selection task' but does not provide specific access information (link, DOI, citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper provides details on hyperparameters and model architecture but does not specify training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | We use A2600 for all experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We select hyperparameters for each algorithm based on the mean final regret. We tune the hyperparameters λ and C for LORIL and ϵ for ϵ-greedy using grid search. |