“Don’t use * in SELECT query, instead use only required column”.
This is one of the tips to optimize SELECT query. However, does this really give better performance in SQL Server query processing?
Selecting only relevant columns doesn’t improve the performance of query processing; however there is an overall performance improvement to the system. For example, when you select only relevant column(s) from a table, the amount of data transferred from database server to frontend server will be less and data get transferred quickly.
In some cases, there will be improvement in query processing too. For example, you are extracting records from a column which is the key column of non-clustered index; in such case, records will be retrieved from non-clustered index page, thus less number of physical database page reads issued. The less physical page read is always good sign for better performance.
Is there any difference in physical page reads when we use SELECT * and SELECT COLUMNS?
“SQL Server by default copy the complete table to buffer pool irrespective of columns selection” – So no matter whether you use * or few columns.
I will elaborate this with an example. Let’s create a table and insert some records.
create table tblBuffer
(
r1 int identity(1,1),
c1 varchar(900) default replicate('a',900),
c2 varchar(900) default replicate('b',900),
c3 varchar(900) default replicate('c',900),
c4 varchar(900) default replicate('d',900),
c5 varchar(900) default replicate('e',900),
c6 varchar(900) default replicate('f',900),
c7 varchar(900) default replicate('g',900),
c8 varchar(900) default replicate('h',900)
) declare @count int
set @count = 1
while @count <= 1000
begin
insert into tblBuffer default values
set @count = @count +1
end
Once the table is ready, verify space used by the table. Executing “sp_spaceused tblBuffer” I got below result. Data size is 8000 KB approximately 1000 pages (1 page = 8KB).
Name | Rows | Reserved | Data | Index_size | Unused |
tblBuffer | 1000 | 8008 KB | 8000 KB | 8 KB | 0 KB |
In order to capture stats of page read operation, I prefer to use the performance monitor (perfmon) with following counters.
SQL Server: Buffer Manager
Database pages - Number of pages in the buffer pool with database content.
Free Pages - Total number of pages on all free lists.
Total Pages - Number of pages in the buffer pool (includes database, free, and stolen).
By now, I have everything setup and ready to do some experiment.
To verify the page read operation, we need to run the SELECT statements, but before that it is good to clean buffer. DBCC DROPCLEANBUFFERS command can be used to clean buffers from the pool.
Let’s execute DBCC DROPCLEANBUFFERS and capture the performance counters value.
(This screen scrap taken from perfmon after execution of DBCC DROPCLEANBUFFERS command)
Now, execute the query “select c1 from tblBuffer” and capture the performance counters values.
(This screen scrap taken from perfmon after execution of select c1 from tblBuffer)
I have taken the performance counter values into below table for clarity.
| After Buffer Cleaned | After query exaction : select c1 from tblBuffer | Difference |
Database Pages | 700 | 1719 | 1019 |
Free Pages | 2866 | 1839 | 1024 |
After execution of query “select c1 from tblBuffer”, the free page has reduced to 1839 from 2866 similarly number of database pages got increased from 700 to 1719. And the difference shown in the above is closely matching with the number of pages of the table.
It clearly shows that, though I select single column, the complete table has been brought into buffer pool.
Next, I will show you an example for table with non-clustered index briefly.
Let’s create a non clustered index on column “c2”
create nonclustered index idx_tblBuffer_c2 on tblBuffer(c2)
The number of pages copied to buffer pool will be approximately 110 when you run this query “select c2 from tblBuffer”... It is because, this time data are copied from non-clustered index “idx_tblBuffer_c2”.
You can verify it in perfmon as we did earlier or use the “set statistics io on” settings to verify the same.
Summary
1. SQL Sever by default copies all the columns to buffer pool, irrespective of column used in the SELECT query. 2. Though you don’t find much performance impact in the SQL Server query processing, it is good to use only the columns you needed as this will increase the overall system performance.
Post a Comment