Saturday, July 7, 2012

The Power of ANSI Window Functions

Tip: If you are in a hurry, just grab all the files from the "Summary" section. Run #1, then #2 and you'll get a result set that matches the sample output in file #3. File #4 tells you what the derived columns in the sample output mean.


Introduction:
This post shows how to use ANSI Windows Functions on any data.  In the next post, I will apply an ANSI window function to SAP data.

ANSI Window Functions are also called "ANSI analytic functions". In this case, the word Window has nothing to do with the O/S, but refers to the ability to define a "sliding window" (also called a "partition") of rows upon which analytic functions can be performed.

Window Functions can do things in a single SQL statement that would otherwise take multiple statements and temp tables, or at least a big ugly statement with lots of virtual tables or Common Table Expressions. Many people might be tempted to code these calculations in a stored procedure, but that is a mistake. Window Functions make it very easily do tasks that would be a nightmare in a procedural language.

What kind of things can you do? I've made a simple (non SAP) example. Once you understand the functions, it's easy to apply that knowledge to SAP data. 

Summary:
  1. SQL Script to create sample table: text file
  2. SQL Query to create sample output: text file
  3. Sample Output: PDF
  4. Definitions: PDF
Say you want to make a report (or a data feed for a wharehouse, in-memory tabular model or another system) that has all the columns in the sample output. Column A - E, which have unshaded headers, come straight from the database.  Columns F - L (headers shaded green) are derived from the first 5 columns.
Each of these derived columns can easily be calculated at the server using ANSI Window functions. You could calculate the derived columns on the front end, just like you could eat ice cream with a fork. You'd eventually get the job done with front end reporting formulas, but it would be messy and inefficient. 

Front end formulas that calculate results over whole sets of data are generally very ugly.  They also require a lot of data to be sent across the network. Letting the server do the work will be perform much better, be faster to code and debug, will ensure consistency, and will reduce network traffic.

Detail:
Let's focus on just one derived column: "Asc Gapless line number", which is column K in the sample output. We'll use the concept in a future post with SAP data. Here's the commented query to create the "Ascending Gapless Line Number":

SELECT



order_nr
,line_nr
,product_id
,product_qt
,unit_resale_am
/* vvvvvv Begin Derived Column for Ascending Gapless Line Nr vvvvv */

,ROW_NUMBER() /* name of the ANSI Window function */
OVER /* separates the function from the definition of the "window" (also called a partition)*/
( /* open paren to begin window definition */
PARTITION BY order_nr /* define the window. The number should reset when the order_nr changes */ ORDER BY
line_nr ASC /* sort within the window, on line_nr from lowest to highest */
) /* close paren to end window definition */
AS [Ascending Gapless Line Nr] /* give an alias to the column */






/* ^^^^^^ End Derived Column for Ascending Gapless Line Nr ^^^^ */

FROM TEST_ANSI_WINDOW /* FROM clause */
ORDER BY order_nr, line_nr /* normal ORDER BY clause. Does not have to be the same as the ORDER BY within th window, but coincidentally is the same in this case *

The basic ANSI function that assigns numbers to rows is called ROW_NUMBER()But once you invoke the function, you must define the "window" (also called a partition) over which the function is to operate. The keyword OVER separates the function from the definition of the window.

The clause (PARTITION BY <column list> ), which must be enclosed in parentheses, defines the window of rows on which to perform the function.  We want the row number to reset for each order, so the PARTITION BY clause specifies just the order_nr columnIn a more complex situation, the PARTITION BY clause could specify more than one column. 

The ORDER BY clause of the window is separate from the ORDER BY clause that may be included at the end of the query. The ordering in the window determines the sort order (and the sort key) by which the rows within the window will be ordered. Once the rows within the window are ordered, the row number will be assigned based on that order. In this case, the ORDER BY clause specifies the line_nr column in ASCending order. The "gaps" in the line numbers from the table don't effect the sort, so the derived column has sequential numbers without gaps, even though the column on which it is based contains gaps.

In the next post, we'll use the knowledge of how to create the "Asc Gapless Line Nr" column to return specific SAP data of interest.