Every database has a transaction log that records all transactions and the database modifications made by each transaction
The transaction log is a critical component of the database. If there is a system failure, you will need that log to bring your database back to a consistent state.
How that works? What steps SQL Server do to keep the consistent state?
Taking a example I want to update 10 rows in my table and those rows are using 2 pages, let’s see step by step SQL Server does.
- Read the pages from buffer pool, if the pages isn’t there SQL Server will get that data from disk and put in memory.
- Start the lock process, SQL Server will acquire intent-exclusive lock for the table and pages, update row-level locks.
- After all rows to be updates are locked, it’s time to convert the update lock to exclusive lock and make the change to the page in memory. This process will create a log record describing the changes were made.
- Next the transaction is ready to commit. This step will write the log records to the transaction log on disk and will acknowledge the commit to the user.
The changed data still in the buffer pool and written in the transaction log on disk. The data will stay in memory util a checkpoint runs and write the changes to the data files.
So, that’s why transaction logs are critical and we have to take care of it.
How to manage the transaction log
How to read transaction logs
Let’s talk about why we have to wait and how to understand the wait types.
Paul Randal in his post Wait statistics, or please tell me where it hurts said:
A thread is using the CPU (called RUNNING) until it needs to wait for a resource. It then moves to an unordered list of threads that are SUSPENDED. In the meantime, the next thread on the FIFO (first-in-first-out) queue of threads waiting for the CPU (called being RUNNABLE) is given the CPU and becomes RUNNING. If a thread on the SUSPENDED list is notified that it’s resource is available, it becomes RUNNABLE and is put on the bottom of the RUNNABLE queue. Threads continue this clockwise movement from RUNNING to SUSPENDED to RUNNABLE to RUNNING again until the task is completed
That’s explain a lot, because the SQL Server threads doesn’t run all in the same time. A good example is when our query is doing physical reads. The IO subsystem is the slowest part of our resources and probably will take some time if the query is reading gigabytes of data.
After the CPU request the data we are querying the disk will run for it, but before send the data back there another step. All data need to go to memory first and imagine a scenario that you don’t have space and first need to open space for that data you need. How many threads are using to run this query? How long will take to the application to show that data?
So, in all this process we can see a couple of waits for example, like PAGEIOLATCH_XX , PAGELATCH_XX, ASYNC_NETWORK_IO, CXPACKET and I will write in detail about of each in the next posts.
What’s a heap table? I would say it’s a table without clustered index.
What’s the characteristic of a heap table? The data isn’t ordered.
What’s the consequence having a heap table? There are a few:
- Specific data is not retrived quickly
- Data pages aren’t linked, that means sequential access needs to refer to the index allocation map (IAM) pages
- No cost to update indexes
- No additional space to store clustered index