Wednesday, April 7, 2021

Weird aspects of ARIMA - how to increase the accuracy of predictions by exogenous data

ARIMA models are commonly used to predict time series. Most importantly, if the ARIMA model is properly chosen, the prediction error is often so small that it is a very difficult task to find better predictions using more sophisticated methods. Since this is not a note about an introduction to ARIMA models I replace the typical introduction to the models with this link which describes the method much better: Forecasting: Principles and Practice" (2nd ed) by Rob J Hyndman and George Athanasopoulos (
One of the variants of ARIMA models is a version using exogeneous data, to which this note is dedicated.
It is not widely known that this version of ARIMA models is strongly dependent on the factor by which we multiply the exogeneous data. Generalizing, we can say that the larger the factor, the smaller the prediction error of the model.

To begin with, let us start with a heuristic proof that the factor determining the ratio between the exogeneous data and the target, can influence the prediction error of the algorithm.

For simplicity, let us omit the part of the time series description which in ARIMA models is responsible for the differentiable part, i.e. we assume that the parameter $d=0$. In other words, we will use the formulations of the ARMAX model.
Then, a given time series X, in ARMAX, can be expressed generally as: \begin{equation} \label{eq1} X_{t}= c + \epsilon_{t} + AR_{t} + MA_{t} + exog_{t} \end{equation} where $c$: constant, $\epsilon_{t}$: white noise, $AR_{t}$: Autoregression part, $MA_{t}$: Moving average part, $exog_{t}$: exogeneous variable(s).
Now, let's define the exogeneous part $exog_{t}$ as: \begin{equation} \label{eq2} exog_{t} = X_{t} \alpha + \gamma_{t} \end{equation} where $\alpha$ is some proportionality factor, $\gamma_{t}$ - white nose.
Therefore, introducing eq. \ref{eq2} to eq. \ref{eq1} we will get: \begin{equation} \label{eq3} X_{t} = c + \epsilon_{t} + AR_{t} + MA_{t} + X_{t} \alpha + \gamma_{t} \end{equation} And after rearranging some terms \begin{equation} \label{eq4} X_{t}\left(1-\alpha\right) = c + \left(\epsilon_{t} + \gamma_{t}\right) + AR_{t} + MA_{t} \end{equation} But the part $AR_{t}$ can be written as $AR_{t} = \sum_{i}^{p} \phi_{i} X_{t-i}$ and similarly the $MA_{t}$ component $MA_{t} = \langle X \rangle + \beta_{t} + \sum_{i}^{q} \theta_{i} \beta_{t-i}$ with: $\beta_{n}$ as a white noise, $ \langle X \rangle $ - the expectation value of the $X$, $\theta_{i}$ - parameters of the model.
With the above in mind, the eq \ref{eq4} becomes : \begin{equation} \label{eq5} X_{t} = \frac{c + \epsilon_{t} + \gamma_{t} + \alpha \langle X \rangle}{1-\alpha} + \sum_{i} \hat{\phi}_{i} X_{t-i} + \langle X \rangle + \hat{\beta}_{t} + \sum_{i} \theta_{i} \hat{\beta}_{t-i} \end{equation} where I introduced notation: $\hat{\beta}_{t-i} = \frac{\beta_{t-i}}{1-\alpha}$, $\hat{\phi}_{i} = \frac{\phi_{i}}{1-\alpha}$ and $\hat{\theta}_{i} = \frac{\theta_{i}}{1-\alpha}$.
Using definitiona of the $AR_{t}$ and $MA_{t}$ we can rewrite eq \ref{eq5} into the final form: \begin{equation} \label{eq6} X_{t} = \frac{c + \epsilon_{t} + \gamma_{t} + \alpha \langle X \rangle}{1-\alpha} + \hat{AR}_{t} + \hat{MA}_{t} \end{equation} where components $\hat{AR}_{t}$ and $\hat{MA}_{t}$ correspond to the definitions $AR_{t}$ and $MA_{t}$ but with $\hat{\phi}$, $\hat{\theta}$ and $\hat{\beta}$ coefficients.
The first component of Equation \ref{eq6} is the most interesting !.
In the case of ARIMA without exogeneous data, the forecasting error is determined by the $\epsilon$ . Now, this error is replaced by the expression \begin{equation} \label{eq7} \epsilon_{t} \longrightarrow \frac{\epsilon_{t} + \gamma_{t}}{1-\alpha} \end{equation}
This is how we reached our final conclusions:
  1. use exogenous variables that are highly correlated ($\alpha \approx 1.$) or anti-correlated ($\alpha \approx -1.$) with the target is equivalent to a model without exogenous variables (but with changed model parameters).
  2. the use of exog data, scaled by the ratio $\frac{exogenous data}{target}$ allows for a significant modification of the final prediction error of the model. The error is now scaled by the factor $\frac{1}{1-\alpha}$. So
    1. we have the error explosions for $\left|\alpha\right| \approx 1$,
    2. for $\left|\alpha\right| < 1 $: error with exog data $>$ error without exogenous variable,
    3. for $\left|\alpha\right| > 1 $: error with exog data $<$ error without exogenous variable.
The above derivation was made for the simplified case without the differential term (i.e., when $d,D=0$). I leave it to the readers to generalize the above formalism into a model with a differential parts.

The next step will be a verification of these hypotheses in practice. A practical example of the implementation of the discussed hypothesis is available in the form of a jupyter-notebook script:
Here, we just present the dependence of the MAPE error as a function of the exog data factor (in the log10 scale). As you can see, by using an appropriate value of the factor we are able to reduce the prediction error by almost half ! The error reaches its minimum value at a factor value of 4000000. Then the error increases. The increasing part after the minimum is reached is not directly visible in the theoretical proof. I will try to explain it in the next part of the article.
All other details are available in the code:

Thank you for the reading !