# Comments on Brightwork Articles on Outliers in Forecasting

## Introduction

These comments are in response to the articles on outliers in forecasting.

## Comment #1: Tim Reilly

Shaun,

Great Article.

I just want to second your point about finding an application which is good at doing this. For example, your equation is the classic regression equation (ie y=a +bx). Most software will use that to do causal modeling. The problem is that regression assumes that the first and last observations have equal importance. Regression ignores time. In time series analysis, this is called “autocorrelation”. Regression is meant for cross-sectional analysis and not time series. You need to use transfer function modeling approach where you weight the historical observations to reflect changes in the relationship over time. The relationship could be between the causal variable and sales and just the history of sales itself (ie seasonality, etc.).

Another complicating factor are the lead and lag relationships between the causal and sales. You need to not just consider the contemporaneous relationship, but also the lead/lags as people don’t buy beverages on new year’s eve but the days leading up to it.

The implications of not adjusting for outliers has been well documented in many Statistical Journals. I will point you to the great work of Ruey Tsay here https://www.unc.edu/~jbhill/tsay.pdf

Your discussion of financial data and Nutrasweet is understood, but when it comes to supply chain, adjusting for outliers is very critical. And it is equally important how you identify them!As you point out, most systems using a simple approach of calling an outlier when it is 2/3 standard deviations outside and then asking you how many iterations of removing and adjusting that you should perform. This approach is very simple and misses other important outliers that distort the model and forecast. You need to identify the outliers while you are building the model AND a final check of 2/3 std deviations at the end of the process. A fun example, we like to torture our competition with is the series 1,9,1,9,1,9,1,5. Where is the outlier? Well we can see that the 5 is unusual and we could call this an inlier as it is “too good to be true” and at the mean. Simple outlier schemes completely miss this outlier and the forecast suffers. The 1,9 example is contrived, but is an example that does happen in datasets we see all the time.

## Would You Like to Comment and Have it Added to This Thread?

Just provide your comment in the chatbox in the lower left of this screen.