Friday, January 20, 2017

SQL-Server Procedure for Parameterized GROUP BY Without Using Dynamic-SQL

SQL SERVER Procedure: Parameterized GROUP BY Without Using Dynamic-SQL : the "impossible" is possible.

NOTE: This procedure and source code builds upon my previous parameterized ORDER BY without using Dynamic-SQL examples:

In this blog entry, I present a Microsoft SQL-Server query / procedure that can accomplish what many would consider truly impossible: performing a parameterized GROUP BY without using Dynamic-SQL to do it. This method builds on some other examples I have provided demonstrating parameterized ORDER BY operations (also without using Dynamic-SQL). This method of performing a GROUP BY on one or more columns, as specified by parameters/directives, is very flexible though it has the drawback of not executing as quickly as if we knew ahead of time (when writing our SQL code) what column(s) an ORDER BY / GROUP BY / aggregation was to act on.

Avoidance of Dynamic-SQL is primarily motivated by security considerations, and this procedure does not use any dynamic-SQL to perform what would otherwise be considered a "dynamic GROUP BY" operation, where the column(s) for sorting and grouping on are variable. SQL-injection exploits can be avoided completely by not using dynamic-SQL. This is particularly important for applications that provide the "public" with access to a database via web-interfaces and the like. And, very often, data-selection and sorting options are provided to the public (or are a desired application feature) that would typically make dynamic-SQL a requirement.

DISCUSSION:

Just in case (pun) you wondered how far can SQL-Server be pushed by using CASE statements to accomplish what would otherwise only be possible with Dynamic-SQL, I put together this procedural example that demonstrates how you can even perform GROUP BY (i.e., for aggregation operations) dynamically via run-time parameters, without dynamic-SQL.

Not only does this example demonstrate how, at execution time, I can force the GROUP BY operation to vary what column is used to group information by, but how I can also cause the aggregation level to happen for one column, two columns, three columns, etc.

This can be some wonderful time-saving code for management reporting where "drill-down" capabilities are required in an application, and simply by specifying additional column(s) to slice and dice our data by, we can view ever increasing detail and be quite flexible about the grouping of that data.

Although it is really an interesting experiment in regards to what is possible with non-dynamic set-based SQL using a single SQL SELECT to perform a tricky piece of logic, there are definitely some drawbacks to the approach. First, there is the need to essentially duplicate a large portion of the logic (pattern) within the SELECT result-column definitions as well as the ORDER BY and the GROUP BY column definitions, since we need to handle all possible cases without dynamic SQL.

NOTE: even though this works and works completely, it is presented as a "proof of concept" solution more than anything, due to the fact it is rather inefficient when used against large tables. The query-optimizer can not do much to help with performance when everything about the query is unknown until run-time (as each row has its table-column values evaluated).

But, now that I have presented a quick argument against this techniques use, I will consider a few of the reasons it is actually a worthwhile approach:

  • We can achieve, with a single procedure, what would otherwise take many procedures to accomplish (presuming dynamic-SQL is not an option)
  • Although the code may appear a bit lengthy, it is not terribly difficult to maintain, as changes are made to the overall "pattern" of the code within each major section — meaning, if you change the format of an output column in one CASE condition, it's a simple matter of copying and pasting that change to the other region(s) where it is used, with minor changes to the surrounding code; 
  • It is easily extended to include additional grouping levels, should that be necessary.


This procedure has been tested against the AdventureWorks sample databases in SQL-Server 2005 and SQL-Server 2008.

SQL-Server Stored Procedure (SP) Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2017
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************

--**********************************************************************************************
-- BEGIN: NON-DYNAMIC-SQL for Dynamically/Variably-Grouped data at run-time via Parameters.
--
-- This procedure builds on the prior example(s) by adding the ability to perform aggregate
-- operations dynamically on one or more columns as directed by parameters.
--**********************************************************************************************

--**********************************************************************************************
-- Using the AdventureWorks database's Production.TransactionHistory Table for this example.
-- I will refer to the various Columns I want to (potentialy) GROUP BY as columns 1 through 3,
-- with the values being assigned as follows:
-- 1 = ProductID
-- 2 = ReferenceOrderID
-- 3 = TransactionDate
--**********************************************************************************************
CREATE PROCEDURE uspProducts_ReturnProductsDynamicallyGrouped
 @GroupColumn1  TINYINT,
 @GroupColumn2  TINYINT,
 @GroupColumn3  TINYINT
AS
BEGIN
 SELECT
  MAX
  (CONVERT(VARCHAR(80),
   COALESCE(
   CASE
    WHEN @GroupColumn1 = 1 THEN 
     'ProductID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, ' ')
    WHEN @GroupColumn1 = 2 THEN
     'ReferenceOrderID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, ' ')
    WHEN @GroupColumn1 = 3 THEN
     'TransactionDate: ' + 
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END, '')
  +
   COALESCE(
   CASE
    WHEN @GroupColumn2 = 1 THEN 
     ', ProductID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, ' ')
    WHEN @GroupColumn2 = 2 THEN
     ', ReferenceOrderID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, ' ')
    WHEN @GroupColumn2 = 3 THEN
     ', TransactionDate: ' + 
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END, '')
  +
   COALESCE(
   CASE
    WHEN @GroupColumn3 = 1 THEN 
     ', ProductID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, ' ')
    WHEN @GroupColumn3 = 2 THEN
     ', ReferenceOrderID: ' + 
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, ' ')
    WHEN @GroupColumn3 = 3 THEN
     ', TransactionDate: ' + 
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END, '')
  ))     AS 'Grouping-Value',
  COUNT(1)   AS 'Count', 
  SUM(ActualCost)  AS 'Total Cost' 
 FROM 
  Production.TransactionHistory AS T
 GROUP BY 
  (
   CASE
    WHEN @GroupColumn1 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn1 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn1 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  ),
  (
   CASE
    WHEN @GroupColumn2 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn2 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn2 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  ),
  (
   CASE
    WHEN @GroupColumn3 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn3 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn3 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  )
 ORDER BY 
  (
   CASE
    WHEN @GroupColumn1 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn1 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn1 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  ),
  (
   CASE
    WHEN @GroupColumn2 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn2 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn2 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  ),
  (
   CASE
    WHEN @GroupColumn3 = 1 THEN 
     dbo.[udfGetIntegerAsPaddedString](T.ProductID, 8, '0')
    WHEN @GroupColumn3 = 2 THEN
     dbo.[udfGetIntegerAsPaddedString](T.ReferenceOrderID, 8, '0')
    WHEN @GroupColumn3 = 3 THEN
     dbo.udfConvertDateToString(T.TransactionDate)
    ELSE NULL
   END
  )
END --Procedure



--**********************************************************************************************
--TESTING / EXAMPLE EXECUTIONS
--**********************************************************************************************

--No grouping: get grand totals only
EXEC uspProducts_ReturnProductsDynamicallyGrouped NULL, NULL, NULL
--Grouping-Value                                              Count       Total Cost
------------------------------------------------------------- ----------- ---------------------
--                                                            113443      27307331.1537


--Group totals by TransactionDate
EXEC uspProducts_ReturnProductsDynamicallyGrouped 3, NULL, NULL
--Grouping-Value                                              Count       Total Cost
------------------------------------------------------------- ----------- ---------------------
--TransactionDate: 20030901                                   3944        1675058.5669
--TransactionDate: 20030902                                   227         39771.10
--TransactionDate: 20030903                                   203         39824.63
--...


--Group totals by TransactionDate and ProductID
EXEC uspProducts_ReturnProductsDynamicallyGrouped 3, 1, NULL
--Grouping-Value                                              Count       Total Cost
------------------------------------------------------------- ----------- ---------------------
--...
--TransactionDate: 20030901, ProductID:      996              32          2786.9806
--TransactionDate: 20030901, ProductID:      997              10          4039.1252
--TransactionDate: 20030901, ProductID:      998              26          9871.0172
--TransactionDate: 20030901, ProductID:      999              22          8553.4416
--TransactionDate: 20030902, ProductID:        3              1           0.00
--TransactionDate: 20030902, ProductID:      316              1           0.00
--...


--Group totals by ProductID and TransactionDate (and show how NULL parms do not affect outcome)
EXEC uspProducts_ReturnProductsDynamicallyGrouped 1, NULL, 3
--Grouping-Value                                              Count       Total Cost
------------------------------------------------------------- ----------- ---------------------
--...
--ProductID:      971, TransactionDate: 20040622              1           0.00
--ProductID:      971, TransactionDate: 20040626              1           1214.85
--ProductID:      971, TransactionDate: 20040629              3           2429.70
--ProductID:      971, TransactionDate: 20040702              1           0.00
--ProductID:      972, TransactionDate: 20030901              31          26240.76
--ProductID:      972, TransactionDate: 20030904              1           0.00
--...


--**********************************************************************************************
-- END: NON-DYNAMIC-SQL for Dynamically/Variably-Grouped data at run-time via Parameters.
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: