Variance Reduced Training with Stratified Sampling for Forecasting Models
Authors: Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate SCott and other baseline optimizers on both synthetic and real-world time series forecasting problems, and demonstrate SCott converges faster with respect to both iterations and wall clock time. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Cornell University, Ithaca, NY, USA. 2Amazon Web Services (AWS) AI Labs, Palo Alto, CA, USA. 3Amazon Research, New York, NY, USA. 4University of Pennsylvania, Philadelphia, PA, USA. |
| Pseudocode | Yes | Algorithm 1 SCott (Stochastic Stratified Control Variate Gradient Descent) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Traffic: A collection of hourly data from the California Department of Transportation. The data describes the road occupancy rates (between 0 and 1) measured by different sensors on San Francisco Bay area free ways. [...] Exchange-Rate: the collection of the daily exchange rates of eight foreign countries including Australia, British, Canada, Switzerland, China, Japan, New Zealand and Singapore ranging from 1990 to 2016. [...] Electricity: The electricity consumption in k Wh was recorded hourly from 2012 to 2014, for n = 321 clients. |
| Dataset Splits | No | The paper mentions training and testing but does not specify validation splits or other detailed splitting methodology. |
| Hardware Specification | Yes | All the tasks run on a local machine configured with a 2.6GHz Inter (R) Xeon(R) CPU, 8GB memory and a NVIDIA GTX 1080 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch TS' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We set the hidden layer size to be 100 and the depth to be 2. Simple Feed Forward Network (MLP) with Negative Log Likelihood (NLL) loss (Alexandrov et al., 2019). We set the hidden layer size to be 80 and the depth to be 4. N-BEATS with MAPE loss (Oreshkin et al., 2019). We set the number of stacks to be 30. [...] We set τc = 3 days (72 hours) and τp = 1 day (24 hours). [...] we set τc = 8 days and τp = 1 day. [...] we set τc = 3 days (72 hours) and τp = 1 day (24 hours). |