Friday, January 20, 2017

Transact-SQL: Set-Based SQL Technique for Running-List of Accumulated Values (String/Number), with Break-Level Resets

MS SQL Server: Algorithm and Query for Set-Based Running Accumulators (i.e., Accumulate Row-Level Values across multiple Rows) into a String of Delimited-values, with Break-Level Resets

Have you ever wanted to perform a set-based running-subtotal operation or running-accumulation-of-values (across multiple rows) in SQL-Server — i.e., have row-level accumulators store a running total or running subtotal or aggregation of values within a "break level", stored in a String as delimited-values (i.e., delimited-list of values)? Want the solution to not use cursors?

Well, in this blog I present one such cursorless method of implementing that type of functionality in a SQL-Server Query, Stored-Procedure, or Function.

The source code provided here is for a Microsoft SQL-Server Transact-SQL (T-SQL) solution that provides a basis for a query / stored procedure (SP) that you wish to perform Set-Based Running String Accumulation with Break-Level Resets operations in. This demonstrates how to accumulate values from multiple rows in a table into a single string, and how to perform a "reset" of that accumulator at your chosen break-level.

This code is very handy for reporting requirements where you need to display on a report a single field whose value is really made up of the aggregation of values from multiple rows in a database. The example scenario used here is reporting all OrderLineIDs related to a product. The AdventureWorks database (sample) from Microsoft provided the basis and data for this example.

WHY DO THIS?

I have run into a few situations where this has been extremely useful. A typical situation involves reporting functionality, where a report is supposed to show a list of values that exist for all items in a group, but only report this information in summary at a group level. Like, e.g., you have a part number with (potentially) a lot of sub-parts / components that make up the "parent" product, and you want to generate a report showing (at the product level) information like price, cost, build-time, and a list of sub-components (in summary - like just their part numbers). Well, this query demonstrates one method for how that can be accomplished. Also, I have used this as a technique to "bind" two reporting procedures together where a master-detail report links the details as a comma-delimited list of primary-key values (and then, in the detail report, I use the power of a user-defined function to transform that delimited list of key-values into a table (link to source code here on my blog) for joining, using this user-defined delimited-list parser SQL function.

I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code I wrote that appears below can easily do both, and does. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.

Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and took only just under 6 seconds to execute on my early Core-i7 Desktop development PC within a VMware Workstation virtual machine, for the entire operation (before I limited output with TOP() function.

SQL-Server Query Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2017
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************

--**********************************************************************************************
-- BEGIN: SET-BASED Running-String-Accumulator Technique
--**********************************************************************************************
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Clears the data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @starttime DATETIME = GetDate();

--Variable to accumulate our delimited string of values-per-break-level
DECLARE @RunningSubtotalDelimString VARCHAR(MAX) = '';

--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @SubTotalBreakValue1 INT = -999; 

--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @Results TABLE
(
  UniqueID                INT IDENTITY NOT NULL PRIMARY KEY,
  SubtotalBreakColumn1    INT,
  ReferenceOrderID        INT,
  OrderLineIDsForProduct  VARCHAR(MAX)
);


--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @Results(
    SubtotalBreakColumn1,
    ReferenceOrderID)
SELECT
    ProductID,
    ReferenceOrderID
FROM
    Production.TransactionHistory
ORDER BY
    ProductID,        --Insert into table in our subtotal-breaking order (IMPORTANT!)
    ReferenceOrderID  --and, if we care, sort the OrderIDs too
;

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
--    See SET-BASED RUNNING SUBTOTALS Example 1, which this builds upon and can easily be
--    extended per the comments therein and/or as demonstrated in Subtotals Example 2.
--**********************************************************************************************
UPDATE
    @Results
SET
    @RunningSubtotalDelimString   = 
        OrderLineIDsForProduct    = 
            CASE 
            WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1 
                THEN @RunningSubtotalDelimString + 
                CASE WHEN @RunningSubtotalDelimString <> '' THEN ',' ELSE '' END +  
                CONVERT(VARCHAR(10), ReferenceOrderID) 
            ELSE CONVERT(VARCHAR(10), ReferenceOrderID) 
            END,
    @SubTotalBreakValue1= SubtotalBreakColumn1
;


SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
--Output the results, showing a few rows to demonstrate the accumulation...
--**********************************************************************************************
SELECT TOP(200) * FROM @results
ORDER BY UniqueID

--UniqueID    SubtotalBreakColumn1 ReferenceOrderID OrderLineIDsForProduct
------------- -------------------- ---------------- ---------------------------------------
--1           1                    426              426
--2           1                    505              426,505
--3           1                    588              426,505,588
--4           1                    675              426,505,588,675
--5           1                    758              426,505,588,675,758
--6           1                    841              426,505,588,675,758,841
--...
--...
--45          1                    3931             426,505,588,[...etc...],3852,3931
--46          2                    425              425
--47          2                    504              425,504
--48          2                    587              425,504,587
--49          2                    674              425,504,587,674
--...
--...


--**********************************************************************************************
--Perhaps we only want one row with the delimited list of unique values for entire break-level 
--(i.e., accumulator level, "subtotal level", aggregation level, running subtotal level)
--NOTE: Limit sample using TOP here.  Demonstrate GROUP BY with MAX to get just break-level val.
--**********************************************************************************************
SELECT TOP(10)
    SubtotalBreakColumn1,
    MAX(OrderLineIDsForProduct) AS AccumulatedOrderIDs  --The "MAX()" row has the most OrderIDs
FROM @results
GROUP BY SubtotalBreakColumn1
ORDER BY SubtotalBreakColumn1
;

--SubtotalBreakColumn1 AccumulatedOrderIDs 
---------------------- -------------------------------------------------------------------------
--1                    426,505,588,675,758,841,[...etc...]3457,3536,3615,3694,3773,3852,3931
--2                    425,504,587,674,757,836,[...etc...]3456,3535,3614,3693,3772,3851,3930

--**********************************************************************************************
-- END: SET-BASED Running-String-Accumulator Technique
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

No comments: