Time-Regularized Interrupting Options (TRIO)
Authors: Timothy Mann, Daniel Mankowitz, Shie Mannor
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that this approach can derive a good set of high-level skills even when the original set of skills cannot solve the problem. |
| Researcher Affiliation | Academia | Daniel J. Mankowitz DANIELM@TX.TECHNION.AC.IL Timothy A. Mann MANN@EE.TECHNION.AC.IL Shie Mannor SHIE@EE.TECHNION.AC.IL Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, Israel |
| Pseudocode | Yes | Algorithm 1 Interrupting Option Value Iteration |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code of its methodology. |
| Open Datasets | No | The paper mentions using a 'gridworld (Sutton & Barto, 1998)' and discusses an 'inventory management domain (Scarf, 1959)' but does not provide concrete access information (link, DOI, formal citation with authors/year, or specific name of an established benchmark dataset) for any publicly available or open dataset used in its experiments. |
| Dataset Splits | No | The paper does not specify exact dataset split percentages or sample counts for training, validation, or test sets, nor does it reference predefined splits with citations. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list any specific software components with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | The resulting algorithm has two tunable parameters l and λ, where l controls the frequency at which the options are updated and λ [0, 1] controls the time-based regularization. We experimented with l = {1, 10, 20, 30, 40} and λ = {0, 0.1, 0.3, 0.5}, unless noted otherwise. |