Online Learning and Profit Maximization from Revealed Preferences
Authors: Kareem Amin, Rachel Cummings, Lili Dworkin, Michael Kearns, Aaron Roth
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We consider algorithmic and learning-theoretic aspects of the classic revealed preferences problem. In this paper, however, we have different and stronger objectives, motivated by what power the merchant has to set prices. We consider two scenarios: (Price-Setting) First, we consider a monopolist merchant who has the power to set prices as he wishes (without fear of losing the consumer to competition). In this setting, we adopt the natural goal of merchant profit maximization. The merchant has a fixed unit cost associated with each good, and his profit, when the consumer buys a bundle x, is his revenue minus the cost of the bundle purchased. The merchant wishes to adaptively set prices so as to minimize his costs, which in turn maximizes his profits. Every round, the consumer purchases her utility maximizing bundle subject to the merchant s prices and her budget constraint. If the merchant knew the consumer s utility function, he could optimally set prices, but he does not instead, the merchant faces a learning problem. For the case when the consumer has a linear utility function, we give an efficient algorithm for setting prices to quickly learn the consumer s utility function and then exploit this knowledge to set profit-maximizing prices. (Exogenous Prices) Second, we consider a merchant who cannot unilaterally set prices, but instead must react to a stream of exogenously chosen prices. This setting is relevant to a seller of commodity goods, or the owner of a franchise that must set prices given by the parent company. Despite his lack of control over prices, this merchant would nevertheless like to be able to predict which bundle the consumer is going to buy in the next period (e.g. to optimize inventory or streamline the supply chain). The problem remains that the consumer s utility function is unknown, and now the price sequence is also unknown and arbitrary (i.e. it could in the worst case be chosen adaptively, by an adversary). In this setting, when the consumer has a linear utility function, we give an efficient algorithm with a small mistake bound in other words, even against an adaptively chosen set of price vectors, in the worst case over consumer utility functions, our algorithm makes only a bounded number of mistaken predictions of bundles purchased. We note that there are a variety of scenarios that fall under the two frameworks above. These include sponsored search or contextual advertising on the web (where advertisers typically must obey periodic budget constraints, and prices are set exogenously by the bids of competitors or endogenously by an ad exchange or publisher); consumers who regularly receive gift certificates which can only be used for purchases from a single merchant such as Amazon, who in turn has price-setting powers; and crowdsourcing or labor manage- ment settings where a manager (merchant) can set rewards or payments for a set of daily tasks, and workers (consumers) with a budget of time or effort select tasks according to the incentives and their own abilities. From a learning-theoretic point of view, the price-setting framework can be viewed as a form of query model, since the merchant is free to experiment with prices (both for the purpose of learning about the consumer, and subsequently in order to set optimal prices); while the exogenous price model falls into the large literature on adversarial, worst- case online learning. The learning problem we consider in both settings is unusual, in that the target function we are trying to learn is the vector-valued arg max (optimal bundle) of the consumer s utility function. Additionally, despite the linearity of the consumer s utility function, the merchant s reward function is a non-convex function of prices, which further complicates learning. Our major assumptions that consumers always spend their entire budget, that there is one divisible unit of each good available in each round, and that consumers repeatedly return to the same merchant are standard in the revealed preferences model. In order to provide efficient learning algorithms in this setting, we necessarily impose additional restrictions on the form of the consumer s utility function. In particular, we assume that the utility function is linear, and that the coefficients are discretized and lower-bounded. The discretization assumption is necessary to learn the utility function exactly, which is required for the merchant s optimization. Even if two functions differ by an arbitrarily small amount, they can induce the consumer to buy very different bundles. We also assume an upper bound on prices, and without loss of generality we rescale this upper bound to be 1. Without such an assumption, the merchant could maximize his profits by setting all prices to infinity. Unbounded prices are neither found in reality, nor lead to an interesting optimization problem. Our Results We first consider the case of a monopolist merchant who has the ability to set prices arbitrarily, and is facing a consumer with an unknown linear utility function. In this setting, we give an algorithm with bounded regret with respect to the optimal (profit-maximizing) set of prices in hindsight. Our argument proceeds in two steps. We first show that, if we knew the consumer s utility function u, then we could efficiently compute the optimal profit-maximizing prices p : Theorem 1 (Informal). There is an efficient algorithm (running in time O(n2 log n)), which given as input the linear consumer utility function u, outputs the profit-maximizing prices p . The analysis of this algorithm first assumes we know only the set of goods purchased by the consumer under optimal prices (but not the optimal prices themselves), and introduces a family of linear programs with one free parameter. We then show there is a small set of values for this parameter, one of which yields the optimal prices. Note that although the consumer s optimization problem when selecting a bundle to purchase given prices is simply a fractional knapsack problem, the problem of computing optimal prices is substantially more complex. The optimal price vector p is actually a subgame perfect Nash equilibrium strategy for the merchant in a two-stage extensive form game between the merchant and the consumer (the merchant first picks prices, and then the consumer best responds). Viewed in this way, the fractional knapsack problem that the consumer solves at the second stage is simply her best response function; what we give is an algorithm for computing the merchant s subgame perfect equilibrium strategy in the first stage of this game. Note that doing this in polynomial time is non-trivial, because the merchant s strategy space is continuous (and even after discretization, is exponentially large). We next give an algorithm that learns the consumer s unknown linear utility function by making price queries. Theorem 2 (Informal). There is an efficient algorithm that learns, after at most O(n) price queries, the utility coefficients for all goods except those that are so preferred they will be bought regardless of prices. This algorithm has two phases. In the first phase, after setting all prices to 1 (the maximum possible price), the algorithm gradually lowers the prices of unpurchased items in succession until they are purchased, thus learning the ratio of their utility coefficient to that of the least preferred good that was purchased under the price vector of all 1s. The harder coefficients to learn are those corresponding to goods purchased even when all prices are 1 these are the consumer s most preferred goods. Some of these are learned by gradually lowering the prices of unpurchased goods until a switch of purchased goods occurs; for the ones that cannot be learned via this procedure, we prove that they are so favored that they will be purchased under any prices, and learning their coefficients is not necessary for price optimization. These two algorithms combine to prove our first main result: Theorem 3 (Informal). There is a price-setting algorithm that, when interacting with a consumer with an unknown linear utility function for T rounds, achieves regret O(n2/T) to the profit obtained by the optimal (profit-maximizing) price vector. In the last part of the paper, we consider the case of a commodity merchant who does not have the power to set prices. The merchant wishes to predict the bundles that a consumer with an unknown linear utility function will buy, in the face of a stream of arbitrary price vectors. Here, the main quantity of interest is how many mistakes we make (by predicting the incorrect bundle) in the worst case over both consumer utility functions and sequences of price vectors. We call this the mistake bound of the algorithm (by analogy to the mistake bounded model of learning). Here we prove our second main result: Theorem 4 (Informal). There exists a polynomial time algorithm in the online exogenous price model that has a mistake bound of O(n2) with high probability. We begin by considering the first model, in which the merchant controls prices, and seeks to maximize profit. First we show that, given the coefficients vi of the consumer s linear utility function, we can efficiently compute the profit- maximizing prices. We will then combine this algorithm with a query algorithm for learning the coefficients, thus yielding an online no-regret pricing algorithm. In this section we assume that all the coefficients vi of the consumer s utility function are known to the merchant. Even then, it is not clear a priori that there exists an efficient algorithm for computing a profit-maximizing price vector p. As previously mentioned, the optimal prices are a subgame perfect Nash equilibrium strategy for the merchant in a twostage extensive form game, in which the merchant has exponentially many strategies. Straightforwardly computing this equilibrium strategy via backwards induction would therefore be inefficient. Our algorithm accomplishes the task in time only (nearly) quadratic in the number of goods. The key to the algorithm s efficiency will stem from the observation that there exists a restricted family of pricing vectors P [0, 1]n containing a (nearly) profit-maximizing vector p . This subset P will still be exponentially large in n, but will be derived (in a manner which will be made more precise) from a small set of vectors p(1), ..., p(n). This derivation will allow the algorithm to efficiently search for p . We define p(k) by letting p(k) i = min(vi/vk, 1). In other words, the price of every good whose value is less than the kth good is set to the ratio vi/vk. Otherwise, if vi > vk, the price of good i in p(k) is set to the ceiling of 1. To understand the operation of the algorithm, consider the consumer s behavior under the prices p(k). Any good priced at vi/vk will have a bang per buck ratio ri = vk. Therefore, the consumer s choice is not uniquely specified by p(k) in general (since the consumer is indifferent between any of the previously mentioned goods). Moreover, the consumer s choice will have great impact on the merchant s profit since the goods between which the consumer is indifferent might have very different production costs ci. The algorithm therefore proceeds by computing, for each p(k), which bundle x(k) the merchant would like the consumer to purchase under p(k). More precisely, for each k, the algorithm computes x(k) arg maxx X(u,p(k),B) x (p(k) c). Note that if the merchant were to actually play the price vector p(k), the consumer would be under no obligation in our model to respond by selecting x(k). Therefore, the final step of the algorithm is to output a price vector which attains nearly optimal profit, but for which the consumer s behavior is uniquely specified. The analysis proceeds by proof of three key facts. (1) For some k, the optimal profit is attained by (p(k), x(k)), or rather, OPT = x(k) (p(k) c) for some k. (2) Given any p(k), x(k) can be computed efficiently (in O(n log n) time). Finally, (3) there is some price ˆp for which the consumer s choice x is uniquely specified, and where x (ˆp c) is close to OPT. We prove the above theorem by establishing the three key facts listed above. The first lemma establishes that optimal profit is attained by some (p(k), x(k)) for some k. We give a sketch of the proof. We now provide a query algorithm for learning the coefficients vi of the consumer s utility function. For the analysis only, we assume without loss of generality that the goods are ordered by decreasing value, i.e. v1 > . . . > vn. Our algorithm can learn the values in some suffix of this unknown ordering; the values that cannot be learned are irrelevant for setting optimal prices, since those goods will always be purchased by the consumer. Algorithm Learn Val proceeds as follows. On each day we choose a particular price vector, observe the bundle purchased at those prices, and then use this information as part of the learning process. We now combine the previous two sections into a complete online, no-regret algorithm. The informal description of Algorithm Profit Max is as follows. First we use Algorithm Learn Val to learn all possible si ratios. For any good i for which we did not learn si, we set pi at 1. Finally, we apply Algorithm Opt Price to the subset of remaining goods, using the si ratios as input. The following main result shows that this approach achieves no-regret. First we show that this algorithm correctly composes Algorithms Learn Val and Opt Price and indeed generates an approximately optimal price vector p. As the lemma shows, every time we price according to p, we receive approximately optimal profit. |
| Researcher Affiliation | Academia | Kareem Amin, Rachel Cummings , Lili Dworkin, Michael Kearns, Aaron Roth Computer and Information Science University of Pennsylvania {akareem,ldworkin,mkearns,aaroth}@cis.upenn.edu; achelc@caltech.edu Research performed while the author was visiting the University of Pennsylvania. Author s current affiliation: Computing and Mathematical Sciences, California Institute of Technology. |
| Pseudocode | Yes | Due to space limitations, some proofs are omitted or only sketched, but are provided in full detail in the full version of this paper, which also contains pseudocode for all algorithms. The full version is available on ar Xiv at: http://arxiv.org/abs/1407.7294 |
| Open Source Code | No | The paper states that the full version is available on arXiv and contains pseudocode, but it does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not mention using or providing access to any specific dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not mention any training/validation/test dataset splits or validation procedures on empirical data. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithms and proofs; it does not describe any experiments that would require specific hardware. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper describes theoretical algorithms and does not mention any software dependencies with specific version numbers for implementation. |
| Experiment Setup | No | The paper describes theoretical algorithms and proofs, not empirical experiments. Therefore, no experimental setup details like hyperparameters or training settings are provided. |