In this series about Control Charts in Tableau, we have learnt the basics about creating them, understood the extra calculations to identify a single outlier and parameterise the control limit calculations and learnt about an alternative type of control chart (P-Chart).
In this blog post we are going a step further than identifying a single outlier, we will implement a series of calculations that identify signals in the data.
Control charts come from a discipline called Statistical Process Control and Six Sigma. Six sigma standing for the 3 Standard Deviations (population) below and above the mean that the rule set uses, the rules are in place to identify a number of different 'signals' in time series data.
The original rules used for signal detection were put in place by Western Electric in 1956, to ensure their employees interpreted control charts in a uniform way. In 1984 these rules were updated by Lloyd S Nelson.
The calculations involved for the most part require the nesting of logic statements within Table Calculations (that are themselves nested), so what I will do is start from the logic and move outwards adding each layer.
If we take Nelson Rule 2 - '9 or more points consecutively above or below the mean.'
1. The Basic logic calculation is; 'Is the point being evaluated above the mean'
IIF(SUM([Value]) > WINDOW_AVG(SUM([Value])), 1, 0 )
From now on i'll refer to the above statement as [Logic1], as the scenario requires us to test for 9 above or 9 below lets assume I have also created a second statement to test the 9 below and call it [Logic2].
2. Next we need to check that at least 9 points are consecutively true;
WINDOW_SUM( [Logic1] ) , -9+1, 0) = 9
In Table Calculations term we are taking a range of 8 points before the current one and the current one (making 9), this is the '-9+1,0)' element of the formula, I have left it as -9+1 for my own benefit so i am clear when reusing the calculation that the range goes where the 9 is. Adding together the output of the logic statement that gives us a 1 or 0. We want this to always be true therefore the sum must equal 9.
3. If we need to apply multiple logic statements, as in this case we can wrap it into a larger IIF function;
IIF( index()>9
AND
(
WINDOW_SUM( [Logic1] ), -9+1, 0) = 9
OR
WINDOW_SUM( [Logic2] ), -9+1, 0) = 9
)
, 1, 0 )
In this case we have [Logic1] where 9 consecutive points are above the mean or [Logic 2] where 9 consecutive points are below the mean, both wrapped in the window_sum looking back 9 periods. In addition i have added index()>9 so that the output is always 1 or 0, as normally if a table calculation can't complete due not enough data points you get a null, so in this case the statement can't be properly completed until at least the 9th value exists. This statement will give you a 1 on the last point in the series when the logic statements are all true.
Now we could stop here and simply identify the last point in a sequence that meets a criteria - but why not add in the ability to highlight all of them?
4. To do this we wrap the above calculations in another table calculation.
WINDOW_MAX( [Stacked Logic] , 0, 9-1 )
By wrapping the stacked logic in a window_max and again with a defined range for that window max to be applied ',0,9-1)' - read as this value to 9-1 values ahead, we are able to duplicate the 1 from the last point in the range back to all the members of the range. (The range in this case is 9 points).
So that is it in this example calculation, below are the calculations written out in full for each of the Nelson Rules for process control charts, using the above structural information hopefully you'll be able to deconstruct the formulae.
If not and you just want them to work, you can copy and paste them replacing [Value] with the measure you are interested in, and assuming your charts are horizontal timelines set your table calculations to 'Table Across' - this should be the default setting.
Rule | Purpose | Calculation |
Nelson Rule 1 (Western Electric Rule 1) |
||
One point is more than 3 standard deviations from the Mean. |
One period is grossly out of control. |
INT( |
Nelson Rule 2 (Western Electric Rule 4) |
||
Nine (or more) points in a row are on the same side of the mean. |
Some prolonged bias exists. |
WINDOW_MAX(
|
Nelson Rule 3 |
||
Six (or more) points in a row are continually increasing (or decreasing).
|
A trend exists. |
WINDOW_MAX(
|
Nelson Rule 4 |
||
Fourteen (or more) points in a row alternate in direction, increasing then decreasing. |
This much oscillation is beyond noise. Note that the rule is concerned with directionality only. The position of the mean and the size of the standard deviation have no bearing. |
WINDOW_MAX(
|
Nelson Rule 5 (Western Electric Rule 2) |
||
Two (or three) out of three points in a row are more than 2 standard deviations from the mean in the same direction. |
There is a medium tendency for samples to be partially out of control. |
WINDOW_MAX(
|
Nelson Rule 6 (Western Electric Rule 3) |
||
Four (or five) out of five points in a row are more than 1 standard deviation from the mean in the same direction. |
There is a strong tendency for samples to be slightly out of control. |
WINDOW_MAX( |
Nelson Rule 7 |
||
Fifteen points in a row are all within 1 standard deviation of the mean on either side of the mean. |
With 1 standard deviation, greater variation would be expected. |
WINDOW_MAX(
|
Nelson Rule 8 |
||
Eight points in a row exist, but none within 1 standard deviation of the mean, and the points are in both directions from the mean. |
Jumping from above to below whilst missing the first standard deviation band is rarely random. |
WINDOW_MAX(
|
The next thing to do to use these is to combine them into a single calculation that you can put on colour size etc.
IF |
By this point in the series you should be pretty comfortable with building Control Charts, so I wont cover that again, but if you use the above calculation on size and colour, using the sample data set (See download link below) you should end up with something like this;
One thing I really struggled with in writing this blog post was finding a data set I could use to demonstrate each rule, where a single rule is broken but the others are not, I couldn't find one so created one which I am more than happy to share. - WE_Nelson_rules_Data.
To see this and the other charts described in this blog series you can download the workbook - SPC blog. - This workbook is version 2018.3, if you have an older version of tableau please feel free to comment below with the version you need and I will endeavour to send a downgraded version of the workbook to you.
So in the immortal words of Forest Gump - 'That is all I have to say about that.'
I hope this blog series has been informative/useful to some, I personally have learnt about the various different Control Chart types and found the time to work out the Table Calculations in this blog - so it has been a useful experience personally.
To find out more about how to do more complex analyses in Tableau...