Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality
Authors: Antoine Scheid, Aymeric Capitaine, Etienne Boursier, Eric Moulines, Michael Jordan, Alain Durmus
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. We conclude this section with experiments showing the empirical convergence of our algorithm to a social optimum. In the simulation, we consider two firms, with firm 1 being upstream and firm 2 being downstream. Their profit functions are respectively given by π1 :7 max q1 2 q1 π2(q1, q2) 7 max Thus, firm 1 s and firm 2 s profit functions depends quadratically on q1 with an firm 1 optimum at q 1 = 5 and a social optimum at q 1 = 8. Note that in the expression of π2, q2 has very little influence as compared to q1 which allows to plot profits for only one value of q2. We discretize the setup, consider a bandit instance (horizon T = 5.106, 10 arms, average over 10 rounds) and we assume that UCB is used as a subroutine. In the first setting, there are no property rights and each firm runs UCB on their side. Second, property rights are defined and firm 2 runs BELGIC as its policy. The plots in Figure 1 display the empirical frequencies and show empirically the effectiveness of BELGIC to mitigate externalities. |
| Researcher Affiliation | Academia | Antoine Scheid1 Aymeric Capitaine1 Etienne Boursier2 Eric Moulines1 Michael I. Jordan3,4 Alain Durmus1 1 Centre de Mathématiques Appliquées CNRS École polytechnique Palaiseau, 91120, France 2 INRIA Saclay, Université Paris Saclay, LMO Orsay, 91400, France 3 University of California, Berkeley 4 Inria, Ecole Normale Supérieure, PSL Research University Paris, 75, France |
| Pseudocode | Yes | Algorithm 1 BELGIC 1: Input: Set of actions A = [K], time horizon T, subroutine Πup p , upstream player s regret constants C, κ, parameters α and β. 2: Compute Hdown,p s = for any s K log2 T β T α . 3: for a A do 4: # See Algorithm 2 5: τ a, τ a = Binary Search(a, log2 T β , T α , 0, 1) 6: end for 7: For any action a A, ˆτa = τ a + 1/T β + CT (κ 1)/2. 8: for t = K T α log2 T β + 1, . . . , T do 9: Get recommended actions by Bandit-Alg on the A A bandit instance, ( at, Bt) = Bandit-Alg(Ut, Hdown,p t 1 ). 10: Offer a transfer ˆτ at on action at, nothing for any other action a A and play action Bt. 11: Observe At = Πup p ( at+1, τ(t + 1), Vt, Hup,p t 1 ), X at,Bt(t) 12: if At = at then update history Hdown,p t . 13: end if 14: Update upstream player s history Hup,p t . 15: end for |
| Open Source Code | No | The NeurIPS checklist states: "Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: Since our work is mostly a theoretic contribution, we do not present experiments. Therefore, our paper is not concerned by this issue (same answer as for item 5)." No explicit link or statement about open-source code availability is provided within the paper. |
| Open Datasets | No | The paper describes a simulated bandit instance setup with specific parameters ("horizon T = 5.106, 10 arms, average over 10 rounds") and defines profit functions for the firms. It does not use or provide access to a pre-existing, publicly available dataset. |
| Dataset Splits | No | The paper describes a simulated environment for its experiments but does not provide specific training, validation, or test dataset splits in the conventional sense for pre-existing data. The experimental setup details simulation parameters. |
| Hardware Specification | No | The paper mentions running "simulations" and discretizing a setup, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to conduct these experiments. |
| Software Dependencies | No | The paper mentions that "UCB is used as a subroutine" in the experiments. UCB is an algorithm, not a specific software dependency with a version number. No other specific software components or their versions are mentioned. |
| Experiment Setup | Yes | We discretize the setup, consider a bandit instance (horizon T = 5.106, 10 arms, average over 10 rounds) and we assume that UCB is used as a subroutine. In the first setting, there are no property rights and each firm runs UCB on their side. Second, property rights are defined and firm 2 runs BELGIC as its policy. |