Universal Option Models
Authors: hengshuai yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method in two domains. The first domain is a real-time strategy game, where the controller must select the best game unit to accomplish a dynamically-specified task. The second domain is article recommendation, where each user query defines a new reward function and an article s relevance is the expected return from following a policy that follows the citations between articles. Our experiments show that UOMs are substantially more efficient than previously known methods for evaluating option returns and policies over options. |
| Researcher Affiliation | Academia | Hengshuai Yao, Csaba Szepesv ari, Rich Sutton, Joseph Modayil Department of Computing Science University of Alberta Edmonton, AB, Canada, T6H 4M5 hengshua,szepesva,sutton,jmodayil@cs.ualberta.ca Shalabh Bhatnagar Department of Computer Science and Automation Indian Institute of Science Bangalore-560012, India shalabh@csa.iisc.ernet.in |
| Pseudocode | No | The paper describes algorithms and presents update rules mathematically (e.g., 'U ok k+1 = U ok k + ηok k δk+1 φ(sk)'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for its methodology is publicly available. |
| Open Datasets | No | The paper mentions using 'a collection from DBLP that has about 1.5 million articles', but it does not provide concrete access information (link, DOI, or a full citation for the dataset itself with authors and year for direct retrieval). |
| Dataset Splits | No | The paper mentions using '3000 trajectories' and averaging results 'over 30 runs', but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility. |
| Hardware Specification | No | The paper mentions running experiments on 'a modern PC with an Intel 1.7GHz processor and 8GB RAM'. While it provides some detail, 'Intel 1.7GHz processor' is not a specific model number (e.g., Core i7-xxxx) required for reproducibility. |
| Software Dependencies | No | The paper states that the implementation was done 'in a MATLAB implementation'. However, it does not provide a specific version number for MATLAB or any other software libraries or dependencies used. |
| Experiment Setup | Yes | The discount factor was 0.9. Features were a lookup table over the 11 11 grid. For all algorithms, only one step of planning was applied per action selection. The planning step-size for each algorithm was chosen from 0.001,0.01,0.1,1.0. Only the best one was reported for an algorithm. All data reported were averaged over 30 runs. |