Variance Reduced Training with Stratified Sampling for Forecasting Models

Authors: Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate SCott and other baseline optimizers on both synthetic and real-world time series forecasting problems, and demonstrate SCott converges faster with respect to both iterations and wall clock time.
Researcher Affiliation Collaboration 1Department of Computer Science, Cornell University, Ithaca, NY, USA. 2Amazon Web Services (AWS) AI Labs, Palo Alto, CA, USA. 3Amazon Research, New York, NY, USA. 4University of Pennsylvania, Philadelphia, PA, USA.
Pseudocode Yes Algorithm 1 SCott (Stochastic Stratified Control Variate Gradient Descent)
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes Traffic: A collection of hourly data from the California Department of Transportation. The data describes the road occupancy rates (between 0 and 1) measured by different sensors on San Francisco Bay area free ways. [...] Exchange-Rate: the collection of the daily exchange rates of eight foreign countries including Australia, British, Canada, Switzerland, China, Japan, New Zealand and Singapore ranging from 1990 to 2016. [...] Electricity: The electricity consumption in k Wh was recorded hourly from 2012 to 2014, for n = 321 clients.
Dataset Splits No The paper mentions training and testing but does not specify validation splits or other detailed splitting methodology.
Hardware Specification Yes All the tasks run on a local machine configured with a 2.6GHz Inter (R) Xeon(R) CPU, 8GB memory and a NVIDIA GTX 1080 GPU.
Software Dependencies No The paper mentions 'Pytorch TS' but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We set the hidden layer size to be 100 and the depth to be 2. Simple Feed Forward Network (MLP) with Negative Log Likelihood (NLL) loss (Alexandrov et al., 2019). We set the hidden layer size to be 80 and the depth to be 4. N-BEATS with MAPE loss (Oreshkin et al., 2019). We set the number of stacks to be 30. [...] We set τc = 3 days (72 hours) and τp = 1 day (24 hours). [...] we set τc = 8 days and τp = 1 day. [...] we set τc = 3 days (72 hours) and τp = 1 day (24 hours).