OrbitML Python Prediction Package — The Easy Way

Michael Rowe
4 min readMar 1, 2023

Orbit is an open-source Python package for Bayesian time-series forecasting. We’re going to show how to use it with one line of code.

Orbit-ML provides a range of models for time-series forecasting, including Bayesian Structural Time-Series (BSTS), Bayesian AutoRegressive Integrated Moving Average (ARIMA), and Bayesian Neural Networks (BNNs).

One notable feature of Orbit is its use of probabilistic modeling to capture the uncertainty inherent in time-series data. This allows users to obtain probabilistic forecasts and credible intervals, which can provide more informative and accurate predictions than point forecasts alone.

Orbit also includes tools for model selection and hyperparameter optimization, making it easy to choose the best model for a given time-series dataset. It also includes tools for model interpretation and visualization, making it easy to understand how the model is making predictions and to diagnose any issues with the model.

Another advantage of Orbit is its flexibility and scalability. It can handle time-series datasets with multiple levels of aggregation, such as hierarchical or grouped time-series data. It can also handle datasets with missing values, making it suitable for real-world time-series datasets.

Installs

The package, and a client for grabbing time-series data as usual.

Grab some data, and check the ver:

Making a one-line version for univariate prediction

There’s a lot there, which is precisely why we’re wrapping it.

This code is fitting a Bayesian LGT (Local Global Trend) model to a time-series data and using it to make predictions for the future time periods.

The first part of the code creates a Pandas DataFrame from the input time-series data ys, and applies a log transformation to the response variable y. It then creates a date range for the DataFrame starting from January 1, 2021, and sets it as the date column ds.

The LGT model is then initialized with the transformed DataFrame as the input data. The response_col parameter specifies the column name of the response variable, date_col specifies the column name of the date variable, and estimator specifies the estimation method to use for the model. Here, the 'stan-map' estimator is used, which is a Markov Chain Monte Carlo (MCMC) estimator based on the No-U-Turn Sampler (NUTS) algorithm. The seed parameter is set to ensure reproducibility of the results.

The second part of the code creates a new DataFrame df from the YS input data, and sets the date column ds in the same way as the training data. It then creates a new DataFrame future_df with a single future time period, starting from the last date in the df DataFrame plus one day.

Finally, the lgt.predict method is used to generate a prediction for the future time period, and the predicted value is transformed back to the original scale using the np.expm1 function. The np.clip function is used to ensure that the predicted value is non-negative. The resulting value is the prediction for the next time period in the time-series data.

Woah. Quite a lot to get one number!

The Easy Way

Well, no matter. Now you just just use it. Note that the actual call to the orbit model is one line only:

And the plot, of course:

The Even Easier Way

You can simply import one-line forecasting functions from the timemachines package. See the examples. One-line functions for time-series forecasting, such as those provided by the Timemachines package or the one we created here, can be a useful tool for quickly and easily generating forecasts.

One of the main advantages of using a one-line forecasting function is that it can help to avoid data leakage. One-line forecasting functions can help to mitigate the risk of overfitting by combining your model with other simple, low-complexity models and performing a precision weighted ensembling, for instance — that can generate reasonable forecasts even when your model is underperforming, for whatever reason.

Another advantage of one-line forecasting functions is that they can be easily integrated into larger systems or workflows without fuss. For example, they can be used to generate forecasts for multiple time-series data sets in a batch processing pipeline, or incorporated into a dashboard or visualization tool to provide real-time forecasts.

Here’s an example of why standardization on a style of one-line prediction is helpful:

--

--

Michael Rowe

Data scientist on health leave. I value constructive interactions. I enjoy probability, statistics and contributing to open source.