Target Tracking for Contextual Bandits: Application to Demand Side Management

Authors: Margaux Brégère, Pierre Gaillard, Yannig Goude, Gilles Stoltz

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations on a real data set gathered by UK Power Networks, in which price incentives were offered, show that our strategies are effective and may indeed manage demand response by suitably picking the price levels.
Researcher Affiliation Collaboration 1EDF R&D, Palaiseau, France 2Laboratoire de math ematiques d Orsay, Universit e Paris-Sud, CNRS, Universit e Paris-Saclay, Orsay, France 3INRIA D epartement d Informatique de l Ecole Normale Sup erieure, PSL Research University, Paris, France.
Pseudocode Yes Protocol 1 Target Tracking for Contextual Bandits
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets Yes We consider open data published by UK Power Networks and containing energy consumption (in k Wh per half hour) at half hourly intervals of a thousand customers subjected to dynamic energy prices... Smart Meter Energy Consumption Data in London Households see https://data.london.gov.uk/dataset/smartmeter-energyuse-data-in-london-households
Dataset Splits No The paper describes a 'training period' and a 'testing period' but does not specify a separate validation split, nor does it provide exact split percentages or sample counts for each partition (train/validation/test).
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific CPU or GPU models, or detailed cloud resource specifications.
Software Dependencies No The paper mentions the use of 'the R package mgcv' but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes We create one year of data using historical contexts and assume that only Normal tariffs are picked at first: pt = (0, 1, 0); this is a training period... Then the provider starts exploring the effects of tariffs for an additional month (a January month, based on the historical contexts) and freely picks the pt according to our algorithm; this is the testing period... For learning to then focus on the parameters j, as other parameters were decently estimated in the training period, we modify the exploration term t,p of (3) into t,p = 2CBt 1(δt 2)k V 1/2 t 1 φ(xt, p) k with Vt 1 = λId + P t 1 s=1 φ(xs, ps)φ(xs, ps)T . We pick a convenient λ.