See what data is in what SQL Server data file?

Learn see what data is in what sql server data file? with practical examples, diagrams, and best practices. Covers sql-server, sql-server-2012 development techniques with visual explanations.

Unveiling Data Distribution: How to See What's in Your SQL Server Data Files

Hero image for See what data is in what SQL Server data file?

Discover methods to identify which data resides in specific SQL Server data files (.mdf, .ndf). This article covers using system views and T-SQL queries to gain insight into file-level data allocation, crucial for performance tuning and storage management.

Understanding how data is distributed across your SQL Server data files is a fundamental aspect of database administration. Whether you're troubleshooting performance bottlenecks, planning for storage growth, or optimizing I/O, knowing which tables and indexes occupy specific files or filegroups can provide invaluable insights. This article will guide you through various T-SQL techniques to inspect the contents of your SQL Server data files, focusing on identifying the objects (tables, indexes) that consume space within them.

Understanding SQL Server File Architecture

Before diving into queries, it's important to grasp the basics of SQL Server file architecture. A SQL Server database consists of one or more data files and at least one transaction log file. Data files typically have .mdf (primary data file) or .ndf (secondary data file) extensions. These files are organized into filegroups, which are logical containers for data files. Objects like tables and indexes can be assigned to specific filegroups, allowing for better management and performance optimization, especially across different storage tiers.

flowchart TD
    DB[SQL Server Database] --> FG1(Filegroup 1)
    DB --> FG2(Filegroup 2)
    FG1 --> MDF1[Primary Data File (.mdf)]
    FG1 --> NDF1[Secondary Data File 1 (.ndf)]
    FG2 --> NDF2[Secondary Data File 2 (.ndf)]
    MDF1 --> T1(Table A)
    MDF1 --> I1(Index A)
    NDF1 --> T2(Table B)
    NDF2 --> I2(Index B)
    DB --> LDF[Transaction Log File (.ldf)]

Simplified SQL Server Database File Architecture

Identifying Objects by Filegroup and File ID

SQL Server provides system views that allow you to query metadata about your database files and the objects within them. The sys.allocation_units view is particularly useful as it links allocation units (which store data for tables and indexes) to filegroups and ultimately to the physical files. By joining this with sys.partitions, sys.objects, and sys.filegroups, you can get a comprehensive view of data distribution.

SELECT
    DB_NAME() AS DatabaseName,
    f.name AS FilegroupName,
    df.name AS DataFileName,
    df.physical_name AS PhysicalFilePath,
    o.name AS ObjectName,
    p.index_id AS IndexID,
    i.name AS IndexName,
    au.type_desc AS AllocationUnitType,
    SUM(au.data_pages) * 8 / 1024 AS UsedMB
FROM
    sys.allocation_units AS au
INNER JOIN
    sys.partitions AS p ON au.container_id = p.partition_id
INNER JOIN
    sys.objects AS o ON p.object_id = o.object_id
INNER JOIN
    sys.indexes AS i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN
    sys.filegroups AS f ON au.data_space_id = f.data_space_id
INNER JOIN
    sys.database_files AS df ON f.data_space_id = df.data_space_id
WHERE
    o.is_ms_shipped = 0 -- Exclude system objects
    AND au.type IN (1, 3) -- IN_ROW_DATA, LOB_DATA
GROUP BY
    f.name, df.name, df.physical_name, o.name, p.index_id, i.name, au.type_desc
ORDER BY
    FilegroupName, DataFileName, UsedMB DESC;

T-SQL query to list objects and their space usage per data file.

Drilling Down to Specific Files

If you want to investigate a specific data file, you can filter the previous query by the file's name or file_id. This is particularly useful when you suspect a particular file is growing unexpectedly or is experiencing high I/O. You can find the file_id from sys.database_files.

DECLARE @TargetFileName NVARCHAR(128) = 'YourDataFile_01'; -- Replace with your file name

SELECT
    DB_NAME() AS DatabaseName,
    f.name AS FilegroupName,
    df.name AS DataFileName,
    df.physical_name AS PhysicalFilePath,
    o.name AS ObjectName,
    p.index_id AS IndexID,
    i.name AS IndexName,
    au.type_desc AS AllocationUnitType,
    SUM(au.data_pages) * 8 / 1024 AS UsedMB
FROM
    sys.allocation_units AS au
INNER JOIN
    sys.partitions AS p ON au.container_id = p.partition_id
INNER JOIN
    sys.objects AS o ON p.object_id = o.object_id
INNER JOIN
    sys.indexes AS i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN
    sys.filegroups AS f ON au.data_space_id = f.data_space_id
INNER JOIN
    sys.database_files AS df ON f.data_space_id = df.data_space_id
WHERE
    o.is_ms_shipped = 0
    AND au.type IN (1, 3)
    AND df.name = @TargetFileName -- Filter by specific file name
GROUP BY
    f.name, df.name, df.physical_name, o.name, p.index_id, i.name, au.type_desc
ORDER BY
    UsedMB DESC;

T-SQL query to find objects within a specific data file.

Using sys.dm_db_partition_stats for Detailed Space Usage

For a more granular view of space usage per partition (which corresponds to a table or index), sys.dm_db_partition_stats is an excellent dynamic management view. While it doesn't directly link to physical files, it provides detailed information about allocated and used space for each partition, which can then be correlated with the filegroup information from sys.partitions.

SELECT
    DB_NAME() AS DatabaseName,
    o.name AS ObjectName,
    i.name AS IndexName,
    p.partition_number AS PartitionNumber,
    fg.name AS FilegroupName,
    SUM(ps.in_row_data_page_alloc_count + ps.lob_data_page_alloc_count + ps.row_overflow_data_page_alloc_count) * 8 / 1024 AS AllocatedMB,
    SUM(ps.used_page_count) * 8 / 1024 AS UsedMB
FROM
    sys.dm_db_partition_stats AS ps
INNER JOIN
    sys.partitions AS p ON ps.partition_id = p.partition_id
INNER JOIN
    sys.objects AS o ON p.object_id = o.object_id
INNER JOIN
    sys.indexes AS i ON p.object_id = i.object_id AND p.index_id = i.index_id
INNER JOIN
    sys.filegroups AS fg ON i.data_space_id = fg.data_space_id
WHERE
    o.is_ms_shipped = 0
GROUP BY
    o.name, i.name, p.partition_number, fg.name
ORDER BY
    AllocatedMB DESC;

T-SQL query using sys.dm_db_partition_stats to show space usage per object and filegroup.