Universal Option Models

Authors: hengshuai yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our method in two domains. The first domain is a real-time strategy game, where the controller must select the best game unit to accomplish a dynamically-specified task. The second domain is article recommendation, where each user query defines a new reward function and an article s relevance is the expected return from following a policy that follows the citations between articles. Our experiments show that UOMs are substantially more efficient than previously known methods for evaluating option returns and policies over options.
Researcher Affiliation Academia Hengshuai Yao, Csaba Szepesv ari, Rich Sutton, Joseph Modayil Department of Computing Science University of Alberta Edmonton, AB, Canada, T6H 4M5 hengshua,szepesva,sutton,jmodayil@cs.ualberta.ca Shalabh Bhatnagar Department of Computer Science and Automation Indian Institute of Science Bangalore-560012, India shalabh@csa.iisc.ernet.in
Pseudocode No The paper describes algorithms and presents update rules mathematically (e.g., 'U ok k+1 = U ok k + ηok k δk+1 φ(sk)'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code No The paper does not provide any statement or link indicating that the source code for its methodology is publicly available.
Open Datasets No The paper mentions using 'a collection from DBLP that has about 1.5 million articles', but it does not provide concrete access information (link, DOI, or a full citation for the dataset itself with authors and year for direct retrieval).
Dataset Splits No The paper mentions using '3000 trajectories' and averaging results 'over 30 runs', but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification No The paper mentions running experiments on 'a modern PC with an Intel 1.7GHz processor and 8GB RAM'. While it provides some detail, 'Intel 1.7GHz processor' is not a specific model number (e.g., Core i7-xxxx) required for reproducibility.
Software Dependencies No The paper states that the implementation was done 'in a MATLAB implementation'. However, it does not provide a specific version number for MATLAB or any other software libraries or dependencies used.
Experiment Setup Yes The discount factor was 0.9. Features were a lookup table over the 11 11 grid. For all algorithms, only one step of planning was applied per action selection. The planning step-size for each algorithm was chosen from 0.001,0.01,0.1,1.0. Only the best one was reported for an algorithm. All data reported were averaged over 30 runs.