Selecting COUNT(*) with DISTINCT
Understanding COUNT(*) with DISTINCT in SQL Server

Explore the nuances of using COUNT(*) with DISTINCT in SQL Server, including common pitfalls, performance considerations, and alternative approaches for accurate and efficient data aggregation.
When working with SQL Server, accurately counting unique occurrences of data is a common requirement. The COUNT()
aggregate function is fundamental for this, but its behavior changes significantly when combined with the DISTINCT
keyword. This article delves into how COUNT(DISTINCT column_name)
works, its implications, and when to consider alternative strategies for optimal performance and correctness, especially in SQL Server 2005 and later versions.
The Basics: COUNT(*) vs. COUNT(DISTINCT column_name)
Before diving into the specifics, it's crucial to understand the difference between COUNT(*)
and COUNT(DISTINCT column_name)
.
COUNT(*)
: This counts all rows in a specified table or view, including duplicates and rows withNULL
values. It's often used to get the total number of records.COUNT(column_name)
: This counts all non-NULL values in the specifiedcolumn_name
. It ignoresNULL
values but includes duplicates.COUNT(DISTINCT column_name)
: This is where the magic happens for uniqueness. It counts only the unique, non-NULL values in the specifiedcolumn_name
. This is the primary method for determining the number of unique entries in a column.
SELECT
COUNT(*) AS TotalRows,
COUNT(ColumnA) AS NonNullColumnA,
COUNT(DISTINCT ColumnA) AS UniqueNonNullColumnA
FROM
YourTable;
Basic usage of COUNT functions in SQL Server.
Performance Considerations and Alternatives
While COUNT(DISTINCT column_name)
is straightforward, it can be a performance bottleneck on large datasets. SQL Server needs to sort the data (or use a hash aggregate) to identify unique values, which can be resource-intensive. For very large tables, this operation can lead to significant I/O and CPU usage.
Consider the following diagram illustrating the process SQL Server might take to execute COUNT(DISTINCT)
:
flowchart TD A[Start Query] --> B{Scan Table/Index} B --> C[Extract Column Values] C --> D["Sort/Hash Aggregate (Identify Unique Values)"] D --> E[Count Unique Values] E --> F[Return Result] F --> G[End Query]
Conceptual flow for COUNT(DISTINCT column_name) execution.
For performance-critical scenarios, especially when dealing with very large tables or frequent queries, you might explore alternatives:
Subquery with
GROUP BY
: This approach first groups the data by the column(s) you want to count distinctly, and then counts the resulting groups. This can sometimes be optimized differently by the query optimizer.ROW_NUMBER()
withPARTITION BY
: For more complex scenarios where you need to identify unique rows based on multiple columns and then count them,ROW_NUMBER()
can be powerful. You can assign a row number within partitions and then count rows whereROW_NUMBER() = 1
.Indexed Views: If the distinct count is frequently needed and the underlying data doesn't change too often, an indexed view can pre-calculate and store the distinct values, offering significant performance gains. However, indexed views have creation and maintenance overheads.
-- Alternative 1: Using GROUP BY
SELECT COUNT(*) FROM (
SELECT DISTINCT ColumnA
FROM YourTable
) AS DistinctValues;
-- Alternative 2: Using ROW_NUMBER() (for counting unique combinations of multiple columns)
SELECT COUNT(*) FROM (
SELECT
ColumnA, ColumnB,
ROW_NUMBER() OVER (PARTITION BY ColumnA, ColumnB ORDER BY ColumnA) as rn
FROM YourTable
) AS SubQuery
WHERE rn = 1;
Alternative methods for counting distinct values.
COUNT(DISTINCT)
is often optimized well by modern SQL Server versions, but alternatives are worth knowing for troubleshooting performance.Common Pitfalls and Best Practices
When using COUNT(DISTINCT)
, be mindful of these points:
NULL
Values:COUNT(DISTINCT column_name)
explicitly excludesNULL
values. IfNULL
should be considered a distinct value, you'll need to handle it explicitly, perhaps by replacingNULL
with a placeholder before counting or using aUNION
with a check forNULL
.Multiple Columns: If you need to count distinct combinations of multiple columns, you must list all columns within the
DISTINCT
clause, e.g.,COUNT(DISTINCT ColumnA, ColumnB)
. Note that older SQL Server versions (like 2005) might not directly supportCOUNT(DISTINCT col1, col2)
. In such cases, you'd typically useCOUNT(DISTINCT col1 + '|' + col2)
(being careful with data types and potential delimiters) or theGROUP BY
subquery approach.Data Types: Be aware of data type conversions if you concatenate columns for distinct counting. Implicit conversions can lead to performance issues or incorrect results.
Indexing: An index on the column(s) used in
COUNT(DISTINCT)
can significantly improve performance, as it can help the optimizer quickly find and sort unique values. A covering index (an index that includes all columns needed by the query) is even better.
COUNT(DISTINCT col1, col2)
syntax is not directly supported. You would typically use a subquery with GROUP BY
or concatenate the columns into a single string (e.g., COUNT(DISTINCT CAST(col1 AS VARCHAR(MAX)) + '-' + CAST(col2 AS VARCHAR(MAX)))
) to achieve the same effect, being cautious about data type conversions and potential string length limits.