Using wait stats on SQL Server

I wrote two posts about wait stats:

  1.  What wait means
  2. About SQL Server saving wait stats on DMVs

Now, going deeper on wait stats and see “why SQL Server is running slow?”. To answer that question I like to start with the DMV sys.dm_os_wait_stats, because this DMV provides a running total of all waits encontered by executing threads in SQL Server instance.

SQL Server categorizes waits across several different type and some of these types only indicate quit period on the instance where threads stay in waiting.

The script below shows the top wait types that have accumulated since SQL Server started or was cleared.

WITH Waits
AS (SELECT  wait_type
, CAST(wait_time_ms / 1000. AS DECIMAL(12, 2)) AS [wait_time_s]
, CAST(100. * wait_time_ms / SUM(wait_time_ms) OVER () AS DECIMAL(12, 2)) AS [pct]
, ROW_NUMBER() OVER (ORDER BY wait_time_ms DESC) AS rn
FROM   sys.dm_os_wait_stats WITH (NOLOCK)
WHERE  wait_type NOT IN (N'CLR_SEMAPHORE', N'LAZYWRITER_SLEEP', N'RESOURCE_QUEUE', N'SLEEP_TASK', N'SLEEP_SYSTEMTASK', N'SQLTRACE_BUFFER_FLUSH'
, N'WAITFOR', N'LOGMGR_QUEUE', N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'XE_TIMER_EVENT', N'BROKER_TO_FLUSH'
, N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT', N'CLR_AUTO_EVENT', N'DISPATCHER_QUEUE_SEMAPHORE', N'FT_IFTS_SCHEDULER_IDLE_WAIT'
, N'XE_DISPATCHER_WAIT', N'XE_DISPATCHER_JOIN', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'ONDEMAND_TASK_QUEUE', N'BROKER_EVENTHANDLER'
, N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP', N'DIRTY_PAGE_POLL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
, N'SP_SERVER_DIAGNOSTICS_SLEEP', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP'
, N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'PWAIT_ALL_COMPONENTS_INITIALIZED'))
, Running_Waits
AS (SELECT  W1.wait_type
, wait_time_s
, pct
, SUM(pct) OVER (ORDER BY pct DESC ROWS UNBOUNDED PRECEDING) AS [running_pct]
FROM   Waits AS W1)
SELECT     wait_type
, wait_time_s
, pct
, running_pct
FROM      Running_Waits
WHERE     running_pct - pct <= 99
ORDER BY  running_pct
OPTION (RECOMPILE);


This query I got from Paul Randal blog, his blog has a lot of information about waits. Since not all wait types are indicators of real issue, the where clause is removing unnecessary type.

The common wait types for me are in the table below, there are much more, but you can start with that list.

Wait_Type Area Description Action
ASYNC_IO_COMPLETION I/O Used to indicate a worker is waiting on a asynchronous I/O operation to complete not associated with database pages Since this is used for various reason you need to find out what query or task is associated with the wait. Two examples of where this wait type is used is to create files associated with a CREATE DATABASE and for “zeroing” out a transaction log file during log creation or growth.
CHECKPOINT_QUEUE Buffer Used by background worker that waits on events on queue to process checkpoint requests. This is an “optional” wait type see Important Notes section in blog You should be able to safely ignore this one as it is just indicates the checkpoint background worker is waiting for work to do. I suppose if you thought you had issues with checkpoints not working or log truncation you might see if this worker ever “wakes up”. Expect higher wait times as this will only wake up when work to do
CHKPT Buffer Used to coordinate the checkpoint background worker thread with recovery of master so checkpoint won’t start accepting queue requests until master online You should be able to safely ignore. You should see 1 wait of this type for the server unless the checkpoint worker crashed and had to be restarted.. If though this is technically a “sync” type of event I left its usage as Background
CXPACKET Query Used to synchronize threads involved in a parallel query. This wait type only means a  parallel query is executing. You may not need to take any action. If you see high wait times then it means you have a long running parallel query. I would first identify the query and determine if you need to tune it. Note sys.dm_exec_requests only shows the wait type of the request even if multiple tasks have different wait types. When you see CXPACKET here look at all tasks associated with the request. Find the task that doesn’t have this wait_type and see its status. It may be waiting on something else slowing down the query. wait_resource also has interesting details about the tasks and its parallel query operator
IO_COMPLETION I/O Used to indicate a wait for I/O for operation (typically synchronous)  like sorts and various situations where the engine needs to do a synchronous I/O If wait times are high then you have a disk I/O bottleneck. The problem will be determining what type of operation and where the bottleneck exists. For sorts, it is on the storage system associated with tempdb. Note that database page I/O does not use this wait type. Instead look at PAGEIOLATCH waits.
LAZYWRITER_SLEEP Buffer Used by the Lazywriter background worker to indicate it is sleeping waiting to wake up and check for work to do You should be able to safely ignore this one. The wait times will appear to “cycle” as LazyWriter is designed to sleep and wake-up every 1 second. Appears as LZW_SLEEP in Xevent
LOGBUFFER Transaction Log Used to indicate a worker thread is waiting for a log buffer to write log blocks for a transaction This is typically a symptom of I/O bottlenecks because other workers waiting on WRITELOG will hold on to log blocks. Look for WRITERLOG waiters and if found the overall problem is I/O bottleneck on the storage system associated with the transaction log
RESOURCE_SEMAPHORE Query Used to indicate a worker is waiting to be allowed to perform an operation requiring “query memory” such as hashes and sorts High wait times indicate too many queries are running concurrently that require query memory. Operations requiring query memory are hashes and sorts. Use DMVs such as dm_exec_query_resource_semaphores and dm_exec_query_memory_grants
SOS_SCHEDULER_YIELD SQLOS Used to indicate a worker has yielded to let other workers run on a scheduler This wait is simply an indication that a worker yielded for someone else to run. High wait counts with low wait times usually mean CPU bound queries. High wait times here could be non-yielding problems
THREADPOOL SQLOS Indicates a wait for a  task to be assigned to a worker thread Look for symptoms of high blocking or contention problems with many of the workers especially if the wait count and times are high. Don’t jump to increase max worker threads especially if you use default setting of 0. This wait type will not show up in sys.dm_exec_requests because it only occurs when the task is waiting on a worker thread. You must have a worker to become a request. Furthermore, you may not see this “live” since there may be no workers to process tasks for logins or for queries to look at DMVs.
WRITELOG I/O Indicates a worker thread is waiting for LogWriter to flush log blocks. High waits and wait times indicate an I/O bottleneck on the storage system associated with the transaction log

The next posts I will show an example for each wait in the list.

Prost!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s