The Meaning of “Set-Based”

Posted by

The term set-based is used to describe an approach to handle querying tasks and is based on principles from the relational model. Remember that the relational model is based in part on mathematical set theory. Set-based solutions use T-SQL queries, which operate on the input tables as sets of rows. Such solutions are contrasted with iterative solutions that use cursors or other iterative constructs to handle one row at a time.

According to set theory, a set should be considered as a whole. This means that your attention should be focused on the set and not on its individual elements. With iterative solutions, you break this principle by operating on one element (row) at a time. Also, a set has no particular order to its elements. So when you use set-based solutions, you cannot make any assumptions about the order of the data. Similarly, unless you add an ORDER BY clause to the query, you’re not guaranteed that the data will be returned in any particular order. With iterative solutions, you process one row at a time and you can do so in a specific order.

When you use set-based solutions, you provide your request as a declarative plain language query. In your request, you focus on the “what” part of the request and let the database engine worry about the “how” part. With iterative solutions, you need to implement both the what and the how parts in your code. As a result, iterative solutions tend to be much longer than set-based ones and harder to follow and maintain.

Another reason why you should stick to set-based solutions is a very pragmatic one—performance. Iterative constructs in T-SQL are very slow. For one thing, loops in T-SQL are much slower than those in other programming languages such as in Microsoft .NET code. Secondly, each record fetch from a cursor by using the FETCH NEXT command has quite a high overhead associated with it. There’s no such overhead when SQL Server processes a set-based solution, even if internally the execution plan for the query involves iterations. As a result, if you know how to tune queries, you are often able to achieve much better performance compared to using iterative solutions.

For example, I have seen something like this:
SET NOCOUNT ON;
DECLARE @Result AS TABLE (
actid INT,
mx MONEY );

DECLARE
@actid AS INT,
@val AS MONEY,
@prevactid AS INT,
@prevval AS MONEY;

INSERT INTO #tbTemp
SELECT actid, val
  FROM dbo.Transactions
  ORDER BY actid, val;

SELECT TOP(1) @prevactid = actid, @prevval = val
FROM #tbTemp;

WHILE @prevactid NOT IS NULL
BEGIN
IF @actid @prevactid
     INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);
 

  DELETE #tbTemp
  WHERE actid = @actid;
 

  SELECT TOP(1) @prevactid = @actid, @prevval = @val
  FROM #tbTemp;
END

IF @prevactid  IS NOT NULL
INSERT INTO @Result(actid, mx) VALUES(@prevactid, @prevval);

SELECT actid, mx
FROM @Result;
GO

Some people think this solution is set-based because it doesn’t explicitly declare and use a cursor object (we talk about this later or never 🙂 ). However, recall that one of the principles implemented by set-based solutions is that they treat the set as a whole as opposed to handling one element at a time. This principle is violated here. Also, set-based solutions do not rely on the order of the data, and this one does.

How we can do this with a set-based rule. Well, lets see the following code:
SELECT
T1.actid, T1.tranid, T1.val,
SUM(T2.val) AS balance
FROM dbo.Transactions AS T1
JOIN dbo.Transactions AS T2 ON T2.actid = T1.actid AND T2.tranid <= T1.tranid
GROUP BY T1.actid, T1.tranid, T1.val;

The code above use a recursive join, recursive querys are commonly used with CTE (we talk later) and for this time this works well, but the best way in SQL Server 2012 is using Window Functions.

SELECT actid, tranid, val,
SUM(val) OVER( PARTITION BY actid
               ORDER BY tranid
               ROWS UNBOUNDED PRECEDING)
AS balance
FROM dbo.Transactions;

The conclusion is, always have a way to work with sets, the first intent wasn’t set-based because works one by one element at time.
I used that code with a temporary table (#tbTemp) because I will discuss this subject in the next post.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s