Adam Laiacano

I'm a data engineer working here at tumblr, living in Brooklyn, and
Currently listening to: .

2011-10-12 Notes: 11

A quick outlier detector for streaming data

Batch processing is great and all, but nobody wants to wait until the next day to find out that their process is lagging. Here’s a quick-and-dirty script to detect outliers.  

It’s just a moving average filter where I calculate the variance of the window with each data point. I hard coded the definition of “outlier” as being more than three standard deviations away from the mean (~97% of normally distributed data), but that can be changed. 

The plot below shows a noisy sine wave with a discrete jump. The black line is the moving average, and anything outside of the green lines is considered an outlier.

If you look at the transition, you can see that a bunch of points get marked as outliers, but it then it adjusts. Here’s a detail of that transition.

And here’s the code:


  1. blogbourse reblogged this from adamlaiacano
  2. brainshadow reblogged this from adamlaiacano
  3. adamlaiacano posted this