Sample Efficient Policy Search for Optimal Stopping Domains

Authors: Karan Goel, Christoph Dann, Emma Brunskill

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation results on student tutoring, ticket purchase, and asset replacement show our approach significantly improves over state-of-the-art approaches. We now demonstrate the setting we consider is sufficiently general to capture several problems of interest and that our approach, GFSE, can improve performance in optimal stopping problems over some state-of-the-art baselines.
Researcher Affiliation Academia Karan Goel Carnegie Mellon University kgoel93@gmail.com Christoph Dann Carnegie Mellon University cdann@cdann.net Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode Yes Algorithm 1: Gather Full, Search and Execute (GFSE)
Open Source Code No The paper only provides a link to a third-party tool (Yelp's MOE) that they used, but does not provide explicit access to the source code for their own methodology (GFSE).
Open Datasets No We use data from Groves and Gini (2015) who collected real pricing data for a fixed set of routes over a period of 2 years, querying travel sites regularly to collect price information. Unfortunately, the authors were unable to provide us with the train/test split used in [Groves and Gini, 2015].
Dataset Splits No The paper discusses training and testing splits for experiments but does not explicitly mention using a validation set or providing specific details on validation splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions 'Yelp s MOE for BO [Yelp, 2016]' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We construct a parameterized policy class (Πsimple) based on Ripper s decision rules in [Etzioni et al., 2003]: WAIT if (curr price > θ0 AND days to depart > θ1) else BUY. We also constructed a more complex class (Πcomplex) with 6 parameters... It performs a simple policy search by sampling and evaluating 500 policies randomly from the policy space. To simulate student data, we fix BKT parameters4 pi = 0.18, pt = 0.2, pg = 0.2, ps = 0.1 and generate student trajectories using this BKT model for H = 20 problems... We sample k = 100 policies from the BKT policy class, and fix a budget of B {100, 1000} trajectories... All results are averaged over 50 trials. We use d = 3 for experiments.