Monday, October 14, 2013

SQL Moving Average Query Techniques and Performance Comparison

SQL-Server Solutions to Moving-Average Query Requirements


Moving-Averages Overview

This article references future blog posts which present the source-code for several additional SQL-Server queries, functions, and stored procedures — these used to be hosted on my external Free SQL code library, which was migrated here in 2016/17. Among this set of SQL queries are a few pieces of code that address the topic of moving averages and approaches to calculating moving average values using only SQL.

Moving averages — and/or rolling averages, moving means, sliding averages, running averages, temporal averages, etc.  — are a common data requirement in these problem domains:
  • financial instrument pricing, and especially the analysis of financial time-series values and information: e.g., stock prices, commodity prices, interest-rates, etc. where it is very common to look at indicators like a 30-day or 90-day moving-average-price
  • operational indicators at a business: production and sales volume trends, error/defect rates trends, and so forth
  • economic indicators like unemployment and jobless claims, new home construction rates
  • environmental indicators like seasonal temperature trends, rainfall, snowfall, and so forth
  • and countless other applications where "smoothing" or "trending" or "momentum" is to be included in some analysis or reporting or charting.
Whether your moving-average needs are best served by SQL-only algorithms, as opposed to perhaps client-side number-crunching and/or non-SQL Server-side code (like .NET CLR procedures), is for you to decide, but with the techniques I have demonstrated, at least you can consider a Transact-SQL-only solution if you choose (and one that is set-based as opposed to relying on cursors).

SQL-only Solutions Compared

See my Set-based Moving-Averages SQL Example 1 page, as I will reference the techniques on that page in my discussion here.

That page and example includes execution-speed / performance comparisons between techniques also (run against the AdventureWorks sample database).  That page demonstrates three different approaches to solving (using only SQL-Server T-SQL) the moving average challenge using:
  1. my own custom approach to calculating a rolling average that takes advantage of some interesting features of Transact-SQL that, although may not work "forever" with every future release of SQL-Server, have worked fine from SQL-Server 2005 through SQL-Server 2008 and 2008r2 and SQL-Server 2012.

    The technique relies on CASE statements and a (so far, so good) predictable order-of-processing the values-assignments within the UPDATE statement where I use local-variables to maintain a "queue" of values as well as break-level-reset tracking information.  This code is quite speedy and flexible and easily adapted to related problem domains.

  2. an example of solving the same problem using a CTE (Common Table Expression) coupled with APPLY (CROSS APPLY / OUTER APPLY that are Microsoft SQL-Server specific extensions to ANSI SQL).  This approach is SLOW!

  3. A newer ANSI-SQL compatible solution that is available to you if you have SQL-Server 2012 or newer that implements the windowing functions using OVER and PARTITION BY.  This method is the simplest solution, is the fastest running, but requires SQL2012+ and also lacks the ability to easily change the "window size" (i.e., number of data-points the moving-average is based upon) without dynamic SQL.

Performance Results / Limitations Examined 

I have been very pleased with the performance of my custom solution to this problem and have used this approach (#1 above) for years now.  Sure, I could simplify my code using SQL-Server 2012's new OVER / PARTITION BY abilities while also gaining a performance boost.  But, until SQL-Server makes the PARTITION window-size (in the "ROWS BETWEEN integer_value_here PRECEDING AND CURRENT ROW" portion of the command) able to utilize a variable instead of a hard-coded integer value, this newest SQL method has limited value in my opinion.

My custom approach comes amazingly close to the native speed of the newer SQL-2012 windowing features, while also having the advantage of being able to handle a variable-window-size (which is perfect for requirements where you want to pass a moving-average-window-size value as a parameter to the procedure).

Notice that I hardly give the APPLY / CTE "solution" much thought.  I presented it as an "option" for a set-based solution to this problem, but its performance is awful and the APPLY is Microsoft-specific as well.  So, I might as well use my own custom approach that may rely on the specifics of MS T-SQL processing-order while gaining the advantage of speed and a rather simple-to-understand approach (as compared to the CTE/APPLY which I find not-readable or easily understandable by comparison).

Conclusion

There are certainly multiple ways to solve the rolling-average / moving-average problem using only SQL, and the choice is yours as to which approach best fits with your needs and situation.  I'd say that if you have a static-window-size and are using SQL-Server 2012 or newer, then the obvious choice is the newest OVER / PARTITION approach.  But, I personally have not run into rolling-average requirements that did not include a variable "window" (e.g., allowing a user to choose how many data-points, periods, etc to include in the average, as specified in a run-time query parameter).

Check out the other SQL queries, functions, and stored procedures I have in provided here on my blog for other related problems and solutions like running subtotals and much more.  I have made it all available as open-source-software (OSS) under a very permissive MIT License.  If you want to learn how to compute moving averages using only SQL, and solve other interesting problems within Microsoft Transact-SQL, this resource should provide some very useful information and ideas. Enjoy.


Continue to read this Software Development and Technology Blog for computer programming articles (including useful free / OSS source-code and algorithms), software development insights, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Delphi, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, plus my varied political and economic opinions.