reproducibilityindex.ai

Sample Efficient Policy Search for Optimal Stopping Domains

Authors: Karan Goel, Christoph Dann, Emma Brunskill

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results on student tutoring, ticket purchase, and asset replacement show our approach significantly improves over state-of-the-art approaches. We now demonstrate the setting we consider is sufficiently general to capture several problems of interest and that our approach, GFSE, can improve performance in optimal stopping problems over some state-of-the-art baselines.
Researcher Affiliation	Academia	Karan Goel Carnegie Mellon University kgoel93@gmail.com Christoph Dann Carnegie Mellon University cdann@cdann.net Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode	Yes	Algorithm 1: Gather Full, Search and Execute (GFSE)
Open Source Code	No	The paper only provides a link to a third-party tool (Yelp's MOE) that they used, but does not provide explicit access to the source code for their own methodology (GFSE).
Open Datasets	No	We use data from Groves and Gini (2015) who collected real pricing data for a fixed set of routes over a period of 2 years, querying travel sites regularly to collect price information. Unfortunately, the authors were unable to provide us with the train/test split used in [Groves and Gini, 2015].
Dataset Splits	No	The paper discusses training and testing splits for experiments but does not explicitly mention using a validation set or providing specific details on validation splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies	No	The paper mentions 'Yelp s MOE for BO [Yelp, 2016]' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We construct a parameterized policy class (Πsimple) based on Ripper s decision rules in [Etzioni et al., 2003]: WAIT if (curr price > θ0 AND days to depart > θ1) else BUY. We also constructed a more complex class (Πcomplex) with 6 parameters... It performs a simple policy search by sampling and evaluating 500 policies randomly from the policy space. To simulate student data, we fix BKT parameters4 pi = 0.18, pt = 0.2, pg = 0.2, ps = 0.1 and generate student trajectories using this BKT model for H = 20 problems... We sample k = 100 policies from the BKT policy class, and fix a budget of B {100, 1000} trajectories... All results are averaged over 50 trials. We use d = 3 for experiments.