Logistic Markov Decision Processes

Authors: Martin Mladenov, Craig Boutilier, Dale Schuurmans, Ofer Meshi, Gal Elidan, Tyler Lu

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of the approach on large models, derived from in-app display advertising data, with up to 254 states and 239 actions. We test the logistic MDP framework and the ALP-APPROX algorithm using models of various sizes derived from a targeted advertising domain. We generate logistic p CTR models that predict the probability of a user response ϕ, representing a click on a displayed (in-app) ad. p CTR models of several sizes are learned from data drawn over a period of one day with roughly 300M training examples.
Researcher Affiliation Collaboration Martin Mladenov TU-Dortmund martin.mladenov@cs.tu-dortmund.de Craig Boutilier Google Research cboutilier@google.com Dale Schuurmans University of Alberta daes@ualberta.ca Ofer Meshi Google Research meshi@google.com Gal Elidan Google Research elidan@google.com Tyler Lu Google Research tylerlu@google.com
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology. A footnote mentions an extended version of the paper is available online, but not code.
Open Datasets No The paper mentions 'in-app display advertising data' and '300M training examples' but does not provide concrete access information (e.g., public dataset name, URL, DOI, or formal citation with authors/year) for this data.
Dataset Splits No The paper mentions '300M training examples' but does not provide specific details on how this data was split into training, validation, and test sets. The term 'validation' is used in the context of the algorithm's convergence, not dataset partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions using GLOP and SCIP to solve LP and BOP problems, respectively, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We run ALP-APPROX using one basis function per sbf (e.g., LARGE uses 2854 basis functions). We solve the subproblems using various numbers of subintervals, or bands, to assess the impact of coarse vs. fine discretization. The reward function is 1 for a click, and our sequential objective is to maximize the expected discounted cumulative click-through rate.