How does Facebook use AutoML

Automatic training of a model for time series forecasting

  • 16 minutes to read

This article shows you how to configure and train a regression model for time series prediction using automated machine learning (AutoML) in the Azure Machine Learning Python SDK.

To do this, proceed as follows:

  • Prepare data for time series modeling
  • Configure specific time series parameters in an object of type
  • Run predictions on time series data

If you are new to code, see the tutorial: Predicting Demand Using Automated Machine Learning for a time series prediction example using automated machine learning in the Azure Machine Learning studio.

In contrast to classic methods for time series, time series values ​​from the past are “pivoted” in automated machine learning and thus serve, together with other forecast elements, as additional dimensions for the regressor. This approach involves several context variables and their relationship to one another during training. Since several factors can affect a prediction, this method works well with real-world prediction scenarios. If z. B. If sales are to be predicted, the result is calculated based on the interactions between historical trends, the exchange rate and the price.


This article requires the following:

  • An Azure Machine Learning workspace. To learn how to create the workspace, see Create an Azure Machine Learning workspace.

  • This article assumes you have a basic understanding of how to set up an automated machine learning experiment. Use the tutorial or guide to familiarize yourself with the most important design patterns for automated ML experiments.

Prepare the data

The main difference between a type of regression problem for predictions and a type of regression problem in AutoML is the inclusion of a feature in your data that represents a valid time series. A regular time series has a clearly defined and consistent interval and a value at each sampling point in an uninterrupted period of time.

Consider the following snapshot of the file: This dataset is daily sales data for a company that has two physical stores: A and B.

There are also features for

  • : enables the model to recognize weekly seasonality.
  • : represents a clean time series with daily frequency.
  • : the target column for running predictions.

Read the data into a pandas data frame, and then use the function to make sure a time series of the type is used.

In this case, the data is already sorted in ascending order according to the time field. However, when setting up an experiment, care must be taken to sort the desired time column in ascending order to create a valid time series.

The following applies to the following code:

  • Assume that the data is 1,000 records, and the code deterministically divides the data to create training and test data sets.
  • It identifies the label column as.
  • It separates the label from to form the group.


When training a model to predict future values, make sure that any features used during training can be used in making predictions for your desired forecast horizon.

So if you are forecasting demand, for example, the inclusion of a feature for the current stock price can greatly improve training accuracy. However, if you use a forecast horizon that is far in the future, future stock prices for future time series points may not be accurately predictable, which can adversely affect model accuracy.

Training and review data

You can specify separate data sets for training and review directly in the object. Learn more about AutoMLConfig.

For time series forecasts, only the Rolling Origin Cross Validation (ROCV) used for verification. Pass in the training and verification data and set the number of cross-validation subsets with the in parameter. The ROCV divides the series into training and validation data with an origin time. When the origin is shifted in time, subsets are created for cross-validation. This strategy maintains the data integrity of time series and avoids the risk of data leaks.

You can also import your own validation data. For more information, see Configuring Data Split and Cross-Validation in AutoML.

Learn more about how AutoML uses cross-validation to prevent overfitting models.

Configure the experiment

The object defines the required settings and data for an automated machine learning task. Configuring a predictive model is similar to setting up a standard regression model, but certain models, configuration options, and featurization steps are specific to time series data.

Supported models

In automated machine learning, various models and algorithms are automatically tested as part of the model creation and optimization process. As a user, you do not have to specify the algorithm. In predictive experiments, both native time series and deep learning models are part of the recommendation system. The following table summarizes this subset of models.


Traditional regression models are also tested as part of the recommendation system for predictive experiments. For the full list of models, see the Supported Models table.

Prophet (preview version)Prophet works best with time series that have strong seasonal effects and that span many seasons of historical data. If you want to take advantage of this model, install it locally using.Fast and accurate, stable against outliers, missing data and dramatic changes in the time series
Auto-ARIMA (preview version)The ARIMA (Auto-Regressive Integrated Moving Average) method performs best when the data is stationary. This means that the statistical properties such as mean and variance are constant for the entire dataset. For example, if you toss a coin, your chance of heads is 50% regardless of whether you toss the coin today, tomorrow, or next year.This is useful for univariate series because past values ​​are used to predict future values.
ForecastTCN (Preview)ForecastTCN is a neural network model designed for the most complex forecasting tasks. It captures non-linear local and global trends in your data as well as relationships between time series.It can take advantage of complex trends in your data and easily scale to the largest datasets.

Configuration settings

You define standard training parameters such as task type, number of iterations, training data and number of cross-validations (similar to a regression problem). In the case of prediction tasks, however, additional parameters must be specified for the experiment.

An overview of additional parameters can be found in the following table. For syntax design patterns, see the ForecastingParameter class reference documentation.

Parameter nameDESCRIPTIONRequired
Specifies the datetime column in the input data that is used to create the time series and to derive the interval.
Defines the number of time periods you want to predict. The horizon is given in units of the time series frequency. The units are based on the time interval of your training data, e.g. B. monthly or weekly to be predicted.
Enable predictive DNNs.
The column names used are used to uniquely identify the time series in data that has multiple rows with the same timestamp. Without a defined time series identifier, a single time series is assumed for the dataset. Further information on individual time series can be found under energy_demand_notebook.
The frequency of the time series dataset. This parameter represents the period in which events can be expected, e.g. Daily, weekly, yearly, etc. The frequency must be a pandas offset alias. More information about the frequency
Number of rows to delay the target values ​​by based on the frequency of the data. This delay is represented as a list or as a single integer. The delay should be used when the relationship between the independent variable and the dependent variable does not match or correlate by default.
Automated ML will automatically determine which features are delayed when is set and set to. Enabling feature delays can help improve accuracy. Feature delays are disabled by default.
n Historical time periods for generating the predicted values, <= size of training set. If not specified, is n the full training set. Specify this parameter if you only want to consider a certain amount of the gradient when training the model. Learn more about rolling time slot aggregations as a goal.
Allows short time series to be processed to prevent them from failing due to insufficient data during training. Short series processing is set to by default. Learn more about processing short rows.
The function used to aggregate the time series target column to match the frequency specified by the parameter. The parameter must be set so that the can be used. The default is. This is sufficient for most of the scenarios in which is used.
Learn more about target column aggregation.

The following applies to the following code:

  • It uses the class to define the predictive parameters for your experiment training.
  • It sets the field in the dataset.
  • It defines the parameter as. This will ensure that two separate time series groups for which data is created: one for store A and one for B.
  • It sets to 50 to run the prediction for the entire test set.
  • He also defines a forecast window for 10 periods.
  • With the parameter he specifies a single delay for the target values ​​by two time periods forward.
  • It sets the recommended setting "Automatic", with which this value is automatically recognized.

These are then passed to the standard -object along with the task type, primary metric, termination criteria, and training data.

The amount of data required to successfully train a predictive model with automated machine learning is affected by the,, and, or values ​​that are specified when you configure yours.

The following formula calculates the amount of historical data needed to construct time series features.

Minimum required history data: (2x) + # + max (max (),)

An error exception is thrown for each row in the dataset that does not meet the required amount of historical data for the specified relevant settings.

Featurization steps

By default, any automated machine learning experiment applies automatic scaling and normalization techniques to your data. These techniques are forms of Featurizationthat for certain Algorithms that respond to features of different sizes are useful. For more information on the standard featureization steps, see Featurization in AutoML.

However, the following steps are only performed for task types:

  • Detect the interval of the time series sample (e.g., hourly, daily, weekly) and create new records for missing points in time to get an uninterrupted series.
  • Imputing missing values ​​in the target column (using forward fill) and the feature column (using median column values)
  • Create features based on time series specifiers to enable fixed effects across series
  • Create time-based features to identify seasonal patterns
  • Coding categorical variables into numeric sets

For a summary of the features created by these steps, see Featurization Transparency.


The automated machine learning feature provisioning steps (feature normalization, handling missing data, converting text to numeric data, etc.) become part of the underlying model. When you use the model for predictions, the feature deployment steps you used during training are automatically applied to your input data.

Customizing the featurization

You can also adjust the featurization settings to ensure that the data and features used to train your ML model result in relevant predictions.

Supported customizations for tasks include:

Update of the column purposeOverride the automatically recognized feature type for the specified column
Update of transformation parametersUpdate the parameters for the specified transformer. Currently supported Imputer (fill_value and median).
Deleting columnsSpecifies columns to be deleted from feature usage.

To adjust the featurisation with the SDK, specify in your object. Learn more about custom featurings.


The function Delete columns has been obsolete since SDK version 1.19. As part of data cleansing, delete columns from your dataset before using it in your automated ML experiment.

If you're using Azure Machine Learning Studio for your experiment, see Customize featurization in Studio.

Optional configurations

Additional optional configurations are available for predictive tasks, such as: B. enabling deep learning and specifying a rolling target window aggregation.

Frequency and target data aggregation

Use the frequency parameter to avoid errors caused by irregular data, that is, Data that does not have a specified interval, e.g. B. hourly or daily data correspond.

For very irregular data or for varying business needs, users can set the desired forecast frequency and specify that to aggregate the target column of the time series. If you use these two settings in the object, you can save time during data preparation.

When using the parameter:

  • The target column values ​​are aggregated based on the specified operation. is suitable for most scenarios.

  • Numeric prediction columns in the data are aggregated by sum, mean, minimum, and maximum. Therefore, automated ML creates new columns with the name of the aggregation function as a suffix and applies the selected aggregation process.

  • For categorical forecast columns, the data is aggregated by mode. This is the most noticeable category in the window.

  • Date prediction columns are aggregated by minimum value, maximum value, and mode.

Supported aggregation operations for target column values:

Sum of the target values
Mean or average of the target values
Minimum value of a goal
Maximum value of a goal

Enable deep learning


DNN support for automated machine learning predictions is in the Preview phase and is not supported for local executions.

You can also apply deep learning with deep neural networks (DNNs) to improve the model's scores. Deep learning with automated machine learning enables the prediction of one- and multi-dimensional time series data.

Deep learning models have three intrinsic functions:

  1. You can learn from any mapping of inputs to outputs.
  2. They support multiple inputs and outputs.
  3. You can automatically extract patterns in input data that span long sequences.

To activate deep learning, specify in the object.

To learn how to enable DNN for an AutoML experiment created in Azure Machine Learning Studio, see the Task Type Settings step-by-step guide in Studio.

For a detailed code example of using DNNs, see the Beverage Production Prediction Notebook.

Rolling time window aggregations as a goal

In many cases, the best information a predictive model can have is the current value of the target. If you are targeting rolling time window aggregations, you can add a rolling aggregation of data values ​​as features. By creating and using these features as additional contextual data, the accuracy of the training model is increased.

Suppose you want to predict energy needs. You can add a rolling three day window feature to capture thermal changes in heated rooms. In this example, we create this window by specifying in the constructor.

The table shows the resulting feature engineering that occurs when time slot aggregation is applied. Columns for the values minimum, maximum and sum are generated in a sliding window over three entries based on the defined settings. Each row has a new calculated feature. For the time stamp "8. September 2017, 4:00 am ”, the values“ maximum ”,“ minimum ”and“ sum ”are set using the Requirement values for September 8, 2017, 1:00 a.m. to 3:00 a.m. This three-entry window is moved to fill the remaining rows with data.

Take a look at a Python code example that uses the rolling time window aggregation feature as a target.

Processing of short rows

In automated machine learning, a time series is considered to be short serieswhen there are not enough data points to complete the training and validation phases of model development. The number of data points varies by experiment and depends on the "max_horizon" value, the number of cross-validation divisions and the length of the model's look-back period; H. the maximum history required to build the time series features. The exact calculation can be found in the reference documentation for "short_series_handling_configuration".

With the parameter in the object, automated machine learning offers processing of short series by default.

To activate the processing of short rows, the parameter must also be defined. To define an hourly frequency, you specify. Here are the options for frequency strings. If you want to change the default behavior (), update the parameter in your object.

The following table summarizes the settings available for.

The following is the default behavior for processing short rows:
  • When all rows are short, the data is filled.
  • If not all rows are short, the short rows are deleted.
  • When set, automated machine learning will add random values ​​to any short series it finds. The following are the types of columns and the values ​​they are populated with:
  • Object columns with NaN values ​​(Not a Number)
  • Numeric columns with 0
  • Boolean / logical columns with "False"
  • The target column is filled with random values ​​with a mean value of 0 and a standard deviation of 1.
  • When set, the short series will be deleted by automated machine learning and will not be used for training or prediction purposes. When predicting these series, NaN values ​​are returned.
    No rows are filled or deleted.


    Padding can affect the accuracy of the resulting model as it uses artificial data to retrieve past workouts without errors.

    If many of the series are short, this can also affect the explainability of the results.

    Run the experiment

    When the object is ready, you can submit the experiment. When the model is complete, retrieve the best-executed iteration.

    Predict with the best model

    Use the best iteration of the model to predict values ​​for the test dataset.

    The function allows you to specify when predictions should be started. In contrast, it is usually used for classification and regression tasks.

    In the following example, first replace all values ​​in with. In this case, the origin of the prediction is at the end of the training data. However, if you only replace the second half of with, the function leaves the numeric values ​​in the first half unchanged, but predicts the values ​​in the second half. The function returns both the predicted values ​​and the fitted features.

    You can also use the parameter in the function to predict values ​​up to a specific date.

    Calculate the root mean squared error (RMSE) between the actual values ​​() and the predicted values ​​().

    After determining the overall model accuracy, the next step is usually to use the model to predict unknown future values.

    Provide a dataset in the same format as the test dataset but with future date / time values ​​to get a prediction set with predicted values ​​for each time series step. For example, suppose the last time series records in the dataset were for 12/31/2018. If you want to forecast demand for the following day (or for any number of forecast periods <=), create a single time series data set for 01/01/2019 for each store.

    Repeat the necessary steps to load this future data into a data frame, and then run to predict future values.


    In-sample prediction is not supported for automated machine learning prediction when and / or is enabled.

    Sample notebooks

    Look at the notebooks for the prediction example. There you can find detailed code samples for an advanced prediction configuration, including:

    Next Steps