Semi-Parametric Efficient Policy Learning with Continuous Actions

Authors: Victor Chernozhukov, Mert Demirer, Greg Lewis, Vasilis Syrgkanis

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation. 4 Application: Personalized Pricing Experimental evaluation. Figure 1: (a) Policy Evaluation (b) Regret Our simulation design considers a sparse model. In each experiment, we generate 1000, 2000, 5000, and 10000 data points, and report results over 100 simulations.
Researcher Affiliation Collaboration Mert Demirer MIT mdemirer@mit.edu Vasilis Syrgkanis Microsoft Research vasy@microsoft.com Greg Lewis Microsoft Research glewis@microsoft.com Victor Chernozhukov MIT vchern@mit.edu
Pseudocode Yes Algorithm 1: Out-of-Sample Regularized ERM with Nuisance Estimates
Open Source Code No The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the methodology described.
Open Datasets No Our simulation design considers a sparse model. We assume that there are k continuous context variables distributed uniformly zi U(1, 2) for i = 1, . . . , k but only l of them affects demand. Let z = 1/l(zi + + zl). Price p and demand d are generated as x N( z, 1), d = a( z) b( z)x + ǫ and ǫ N(0, 1). (The paper describes a synthetic data generation process but does not state that the resulting dataset is publicly available or provide access information for it.)
Dataset Splits Yes In particular, we crucially need to augment the ERM algorithm with a validation step, where we split our data into a training and validation step... Algorithm 1: ... which we randomly split in two parts S1, S2. Moreover, we randomly split S2 into validation and training samples Sv 2 and St 2. We estimate the nuisance functions using 5-fold cross-validated lasso model with polynomials of degrees up to 3 and all the two-way interactions of context variables.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using a '5-fold cross-validated lasso model' but does not specify any software libraries or their version numbers (e.g., scikit-learn version, R package version) that were used.
Experiment Setup Yes We estimate the nuisance functions using 5-fold cross-validated lasso model with polynomials of degrees up to 3 and all the two-way interactions of context variables. Our simulation design considers a sparse model. We assume that there are k continuous context variables distributed uniformly zi U(1, 2) for i = 1, . . . , k but only l of them affects demand. Let z = 1/l(zi + + zl). Price p and demand d are generated as x N( z, 1), d = a( z) b( z)x + ǫ and ǫ N(0, 1). We consider four functional forms for the demand model: (i) (Quadratic) a(z) = 2z2, b(z) = 0.6z, (ii) (Step) a(z) = 5{z < 1.5} + 6{z > 1.5}, b(z) = 0.7{z < 1.5} + 1.2{z > 1.5}, (iii) (Sigmoid) a(z) = 1/(1 + exp(z)) + 3, b(z) = 2/(1 + exp(z)) + 0.1, (iv) (Linear) a(z) = 6z, b(z) = z In each experiment, we generate 1000, 2000, 5000, and 10000 data points, and report results over 100 simulations. We present the results for two regimes: (i) Low dimensional with k = 2, l = 1, (ii) High dimensional with k = 10, l = 3.