Physical location of a row in SQL Server

Introduction

First a warning: These are undocumented and unsupported features in SQL Server so use at your own risk!

In Oracle there is a pseudo column for each row called ROWID which reveals the physical address of a row in the database file. This column can be used to identify a single row even when a row doesn’t have a key. Based on documentation SQL Server seems to lack this kind of functionality but that’s not quite true since SQL Server also has a mechanism to identify the physical address of a row. In SQL Server 2008 this is called %%physloc%% and in SQL Server 2005 %%lockres%%. This article tries to describe the basic usages of this pseudo column in SQL Server 2008.

How is it used

First let’s create a table to test this feature. In order to test different usages we’ll need a table with several rows.

----------------------------------
-- Create test objects
----------------------------------
-- Schema
CREATE SCHEMA PhysLocTest;
GO

-- SeveralRows -table
IF OBJECT_ID ( 'PhysLocTest.SeveralRows', 'U' ) IS NOT NULL 
DROP TABLE PhysLocTest.SeveralRows;
GO

-- Create the table
CREATE TABLE PhysLocTest.SeveralRows (
   Id         int        NOT NULL IDENTITY(1,1)PRIMARY KEY,
   InsertTime date       NOT NULL DEFAULT (GETDATE()),
   Category   varchar(2) NOT NULL
);
GO

-- Fill the table with test data. Contains 100’000 rows in 11 categories
SET NOCOUNT ON
DECLARE @counter int;
BEGIN
   SET @counter = 0;
   WHILE @counter < 100000 BEGIN
      INSERT INTO PhysLocTest.SeveralRows (Category) 
      VALUES (CONVERT(varchar, ROUND( RAND(), 1) * 10 ));

      SET @counter = @counter + 1;
   END;
END;

After the table is created and filled you can try the %%physloc%% pseudo column:

-----------------------------------------
-- Find physical address of first 5 rows
-----------------------------------------
SELECT TOP(5)
       a.%%physloc%% AS Address,
       a.*
FROM   PhysLocTest.SeveralRows a
ORDER BY a.Id;

-- Results (physical locations, Id's and categories vary):
 
Address             Id  InsertTime  Category
------------------  --  ----------  --------
0xFB0D000001000000  1   2011-02-19  3
0xFB0D000001000100  2   2011-02-19  2
0xFB0D000001000200  3   2011-02-19  4
0xFB0D000001000300  4   2011-02-19  2
0xFB0D000001000400  5   2011-02-19  1

At this point you’ll see that each row has an unique address. The address actually contains information about the file, page and the slot the row is in. However the hexadecimal value isn’t very easy to interpret so SQL Server has a function called sys.fn_PhysLocFormatter to better visualize the location of the row.

Using sys.fn_PhysLocFormatter

This function takes the physical address as a parameter and formats the address to text to show the location of a row.

-----------------------------------------
-- Find physical address of first 5 rows
-----------------------------------------
SELECT TOP(5)
       a.%%physloc%%                          AS Address,
       sys.fn_PhysLocFormatter(a.%%physloc%%) AS AddressText,
       a.*
FROM   PhysLocTest.SeveralRows a
ORDER BY a.Id;
 

-- Results (physical locations and categories vary):

Address             AddressText  Id  InsertTime  Category
------------------  -----------  --  ----------  --------
0xFB0D000001000000  (1:3579:0)   1   2011-02-19  3
0xFB0D000001000100  (1:3579:1)   2   2011-02-19  2
0xFB0D000001000200  (1:3579:2)   3   2011-02-19  4
0xFB0D000001000300  (1:3579:3)   4   2011-02-19  2
0xFB0D000001000400  (1:3579:4)   5   2011-02-19  1

So now you have the physical address in clear format. Based on the output row with Id 3 is located in the file 1 on page 3579 and in slot 2. Now you can for example identify the actual data file the row is located in using system view sys.database_files.

-----------------------------------------
-- Find the actual database file
-----------------------------------------
SELECT df.type_desc,
       df.name,
       df.physical_name
FROM   sys.database_files df
WHERE  df.file_id = 1;
 
-- Results:
type_desc  name    physical_name
---------  ------  -------------
ROWS       Test02  C:\Program Files\Microsoft SQL Server\Inst1\MSSQL\DATA\test02.ndf

If you want to go further you can dump the contents of the block using DBCC PAGE. In order to view information from DBCC PAGE, trace flag 3604 has to be set on. The DBCC PAGE command takes the following parameters:

Database name or database Id
Number of the file
Number of the page
Level of detail in the output

0 = header
1 = header and hex dump for rows
2 = header and the page dump
3 = header and detail row information

-----------------------------------------
-- Get the page dump for rows
-----------------------------------------
DBCC TRACEON(3604)
DBCC PAGE (Test, 1, 3579, 1)
DBCC TRACEOFF(3604)
 
-- Results:
PAGE: (1:3579)
 
BUFFER:
BUF @0x0000000084FD4200
bpage = 0x00000000848B0000 bhash = 0x0000000000000000 bpageno = (1:3579)
bdbid = 15 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 5565 bstat = 0xc0010b
blog = 0xbbbbbbbb bnext = 0x0000000000000000 
 
PAGE HEADER:
Page @0x00000000848B0000
m_pageId = (1:3579) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 365 m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594061848576 
Metadata: PartitionId = 72057594056081408 Metadata: IndexId = 1
Metadata: ObjectId = 1746821285 m_prevPage = (0:0) m_nextPage = (1:3582)
pminlen = 11 m_slotCnt = 384 m_freeCnt = 10
m_freeData = 7414 m_reservedCnt = 0 m_lsn = (78:1213:16)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = 0 
 
Allocation Status
GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED 
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED 
 
DATA:
Slot 0, Offset 0x60, Length 19, DumpStyle BYTE
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 19 
Memory Dump @0x000000001613A060
0000000000000000: 30000b00 01000000 ea330b03 00000100 †0.......ê3...... 
0000000000000010: 130033†††††††††††††††††††††††††††††††..3 
...

Can this be used to speed up fetches

In Oracle ROWID is sometimes used to speed up fetches (although it’s not advisable). Since the ROWID is the physical location of the row there’s no need to locate the row for example using an index if the ROWID is already known. Does the same apply to SQL Server. The answer is simply no. %%physloc%% acts more like a function so if it is used in the WHERE clause of an SQL statement SQL Server needs to scan the row locations and then pick up the matching location. To test this let’s try to select a single row from the test table using both Id (primary key) and the location.

-----------------------------------------
-- Fetch the row based on primary key
-----------------------------------------
SELECT a.%%physloc%% AS Address,
       a.*
FROM   PhysLocTest.SeveralRows a
WHERE  a.Id = 4321
 

-- Results:
Address             Id    InsertTime  Category
------------------  ----  ----------  --------
0x641E000001006100  4321  2011-02-19  8
 
-- Statistics:
Table 'SeveralRows'. Scan count 0, logical reads 2, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the visual execution plan (total cost 0,0032831, estimated number of rows from index seek = 1):

That was quite efficient. When using the Id column in the condition it’s clearly seen in both execution plan and the statistics that SQL Server is using the primary key index to fetch the row. Now what happens if the physical location is used instead.

-------------------------------------------
-- Fetch the row based on physical location
-------------------------------------------
SELECT a.%%physloc%% AS Address,
       a.*
FROM   PhysLocTest.SeveralRows a
WHERE  a.%%physloc%% = 0x641E000001006100
 
-- Results:
Address             Id    InsertTime  Category
------------------  ----  ----------  --------
0x641E000001006100  4321  2011-02-19  8
 
-- Statistics:
Table 'SeveralRows'. Scan count 1, logical reads 29, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And the visual execution plan (total cost 0,323875, estimated number of rows from index scan = 100'000):

Now SQL Server has to scan the clustered primary key index in order to find the correct physical address. Scan count is 1 and 29 pages are read. Not very good choice from performance point of view. So we can safely say that using physical location does not provide any kind of performance improvement if used.

Identifying a row without a key

One use-case for physical location is to identify a single row in a table even when the table does not have a key defined on it. A quite common issue is: How can I delete duplicate rows when I don’t have a key to use. There are several possibilities to do that. One is to add a key to the table and use it for deletion or perhaps to create a temporary table, load unique rows with DISTINCT keyword into the temp table, truncate the original table and reload the distinct data back. Both of these approaches work but they have few issues one should think of.

In the first solution (add a key) the table structure is changed, actually the row is widened. Changing the structure is something that should naturally be avoided when simple DML operations (SELECT, INSERT, UPDATE) are used. In the second solution (reload using a temp table) all the data is deleted and then added back to the original table. This may cause problems and inconsistent data if there are for example triggers on the original table.

In our test table there was a primary key but let’s forget that for a moment. What if I want to delete all duplicate rows so that only one row is left for each category? If the random number generator worked well there should be 11 categories in the table, values 0-10. So basically I have to create a delete statement which excludes one row for each category based on physical location. In the following example I decided to leave the smallest location for each category and delete the rest. Note: Smallest location isn't necessarily the firstly added row which could be an easy misunderstanding.

-------------------------------------------
-- Delete duplicates based on Category
-------------------------------------------
DELETE
FROM  PhysLocTest.SeveralRows 
WHERE PhysLocTest.SeveralRows.%%physloc%% 
NOT IN (SELECT MIN(b.%%physloc%%)
        FROM   PhysLocTest.SeveralRows b
        GROUP BY b.Category);
 
-------------------------------------------
-- Check the data
-------------------------------------------
SELECT * 
FROM   PhysLocTest.SeveralRows a
ORDER BY a.Category;
 
-- Results
Id     InsertTime  Category
-----  ----------  --------
9      2011-02-19  0
5      2011-02-19  1
6      2011-02-19  10
2      2011-02-19  2
1      2011-02-19  3
3      2011-02-19  4
21     2011-02-19  5
14     2011-02-19  6
10     2011-02-19  7
17     2011-02-19  8
16     2011-02-19  9
 
(11 row(s) affected)

Task accomplished, duplicates have been removed.

Conclusions

%%physloc%% pseudo column helps to locate a physical row and dump the contents of the page. This is handy if you want to investigate the structure of the block or you want to find out the actual row data stored. This pseudo column also helps in situations where you have to identify a single row even if you don’t have a key defined on the table. In normal SQL Server usage this column does not provide any improvements that could be used for example in programming. So never ever store the value of the physical location in a variable and try to use the value it afterwards. No performance is gained, actually vice versa.

When migrating an application from Oracle %%physloc%% can be used to replace ROWID but some other approach, most likely a key, should be implemented as soon as possible in order to have decent response times.

Also keep in mind that the physical location of a row can change for example when clustered index is rebuilt. And of course since this is an undocumented and unsupported feature it may disappear in the next release of SQL Server.

History

February 20th, 2011: Created

关键字 physical,location,of,row,in,sql,server

推荐.NET配套的通用数据层ORM框架：CYQ.Data 通用数据层框架

IT技术博客

Introduction

How is it used

Using sys.fn_PhysLocFormatter

Can this be used to speed up fetches

Identifying a row without a key

Conclusions

History

发表评论

公告信息

文章搜索

文章分类

文章档案

最新文章

最新评论

阅读排行榜

评论排行榜

友情链接