Semi-Parametric Efficient Policy Learning with Continuous Actions
Authors: Victor Chernozhukov, Mert Demirer, Greg Lewis, Vasilis Syrgkanis
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation. 4 Application: Personalized Pricing Experimental evaluation. Figure 1: (a) Policy Evaluation (b) Regret Our simulation design considers a sparse model. In each experiment, we generate 1000, 2000, 5000, and 10000 data points, and report results over 100 simulations. |
| Researcher Affiliation | Collaboration | Mert Demirer MIT mdemirer@mit.edu Vasilis Syrgkanis Microsoft Research vasy@microsoft.com Greg Lewis Microsoft Research glewis@microsoft.com Victor Chernozhukov MIT vchern@mit.edu |
| Pseudocode | Yes | Algorithm 1: Out-of-Sample Regularized ERM with Nuisance Estimates |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the methodology described. |
| Open Datasets | No | Our simulation design considers a sparse model. We assume that there are k continuous context variables distributed uniformly zi U(1, 2) for i = 1, . . . , k but only l of them affects demand. Let z = 1/l(zi + + zl). Price p and demand d are generated as x N( z, 1), d = a( z) b( z)x + ǫ and ǫ N(0, 1). (The paper describes a synthetic data generation process but does not state that the resulting dataset is publicly available or provide access information for it.) |
| Dataset Splits | Yes | In particular, we crucially need to augment the ERM algorithm with a validation step, where we split our data into a training and validation step... Algorithm 1: ... which we randomly split in two parts S1, S2. Moreover, we randomly split S2 into validation and training samples Sv 2 and St 2. We estimate the nuisance functions using 5-fold cross-validated lasso model with polynomials of degrees up to 3 and all the two-way interactions of context variables. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using a '5-fold cross-validated lasso model' but does not specify any software libraries or their version numbers (e.g., scikit-learn version, R package version) that were used. |
| Experiment Setup | Yes | We estimate the nuisance functions using 5-fold cross-validated lasso model with polynomials of degrees up to 3 and all the two-way interactions of context variables. Our simulation design considers a sparse model. We assume that there are k continuous context variables distributed uniformly zi U(1, 2) for i = 1, . . . , k but only l of them affects demand. Let z = 1/l(zi + + zl). Price p and demand d are generated as x N( z, 1), d = a( z) b( z)x + ǫ and ǫ N(0, 1). We consider four functional forms for the demand model: (i) (Quadratic) a(z) = 2z2, b(z) = 0.6z, (ii) (Step) a(z) = 5{z < 1.5} + 6{z > 1.5}, b(z) = 0.7{z < 1.5} + 1.2{z > 1.5}, (iii) (Sigmoid) a(z) = 1/(1 + exp(z)) + 3, b(z) = 2/(1 + exp(z)) + 0.1, (iv) (Linear) a(z) = 6z, b(z) = z In each experiment, we generate 1000, 2000, 5000, and 10000 data points, and report results over 100 simulations. We present the results for two regimes: (i) Low dimensional with k = 2, l = 1, (ii) High dimensional with k = 10, l = 3. |