count distinct records (all columns) not working
Counting Distinct Records Across All Columns in SQL Server

Learn how to accurately count distinct records based on all columns in a SQL Server table, addressing common pitfalls and exploring efficient methods.
When working with databases, a common requirement is to identify and count unique records. While COUNT(DISTINCT column_name)
is straightforward for a single column, counting distinct records based on all columns in a table can be more challenging. This article explores various SQL Server techniques to achieve this, discusses their implications, and provides practical examples.
Understanding the Challenge of 'Distinct All Columns'
The core challenge lies in defining what 'distinct across all columns' truly means. It implies that two rows are considered identical only if every single column value matches. SQL Server's DISTINCT
keyword applies to all columns specified in the SELECT
list. If you select *
, it will return distinct rows based on all columns in the table. However, simply wrapping COUNT(*)
around SELECT DISTINCT *
is not valid SQL syntax.
flowchart TD A[Start: Need Distinct Row Count] --> B{Are all columns relevant?} B -->|Yes| C[Consider all columns for uniqueness] B -->|No| D[Specify relevant columns] C --> E[Method 1: COUNT(DISTINCT HASHBYTES)] C --> F[Method 2: COUNT(*) OVER (PARTITION BY ...)] C --> G[Method 3: Subquery with DISTINCT] D --> H[Use COUNT(DISTINCT col1, col2, ...)] E --> I[Result: Distinct Row Count] F --> I G --> I H --> I
Decision flow for counting distinct records.
Method 1: Using a Subquery with DISTINCT
The most direct and often clearest way to count distinct rows based on all columns is to use a subquery. You first select all distinct rows, and then count the number of rows returned by that subquery. This approach is highly readable and generally performs well for moderately sized tables.
SELECT COUNT(*)
FROM (
SELECT DISTINCT *
FROM YourTableName
) AS DistinctRecords;
*Counting distinct records using a subquery with SELECT DISTINCT **
TEXT
, NTEXT
, IMAGE
, XML
, GEOMETRY
, GEOGRAPHY
, or UDT
columns if you're using SELECT DISTINCT *
, as these types cannot be directly compared for distinctness.Method 2: Using HASHBYTES for Large Tables or Incompatible Data Types
When dealing with very wide tables, or tables containing data types that DISTINCT
cannot directly compare (like TEXT
, NTEXT
, IMAGE
, XML
), generating a hash of each row can be an effective strategy. You concatenate all column values into a single string (or binary representation) and then hash it. Counting the distinct hash values gives you the distinct row count. This method requires careful handling of NULL
values and data type conversions to ensure consistent hashing.
SELECT COUNT(DISTINCT RowHash)
FROM (
SELECT HASHBYTES('SHA2_256',
CONCAT(
ISNULL(CAST(Column1 AS NVARCHAR(MAX)), ''),
ISNULL(CAST(Column2 AS NVARCHAR(MAX)), ''),
-- ... include all columns, converting to NVARCHAR(MAX)
ISNULL(CAST(ColumnN AS NVARCHAR(MAX)), '')
)
) AS RowHash
FROM YourTableName
) AS HashedRecords;
Counting distinct records using HASHBYTES for row uniqueness
HASHBYTES
is powerful, it's computationally intensive and can be slow for extremely large tables. Also, be aware of hash collisions, though with SHA2_256, the probability is extremely low for typical database sizes. Ensure all columns are explicitly cast to a comparable type (e.g., NVARCHAR(MAX)
) and NULL
s are handled consistently (e.g., ISNULL(col, '')
).Method 3: Using ROW_NUMBER() with PARTITION BY
Another robust approach, especially useful if you need to retrieve the distinct rows themselves or perform further operations, is to use the ROW_NUMBER()
window function. By partitioning by all columns, you assign a rank to each row within groups of identical rows. Then, you can filter for ROW_NUMBER() = 1
to get one representative for each distinct group, and count those.
SELECT COUNT(*)
FROM (
SELECT
Column1, Column2, -- ... all columns
ROW_NUMBER() OVER (PARTITION BY Column1, Column2, ..., ColumnN ORDER BY (SELECT NULL)) AS rn
FROM YourTableName
) AS RankedRecords
WHERE rn = 1;
Counting distinct records using ROW_NUMBER() over all columns
ORDER BY (SELECT NULL)
clause in ROW_NUMBER()
is a common idiom in SQL Server when the order within a partition doesn't matter for distinctness. This method is very flexible if you need to select the distinct rows themselves, not just their count.