linq distinct or group by multiple properties
Categories:
LINQ: Distinct or GroupBy for Multiple Properties

Explore how to effectively use LINQ's Distinct
and GroupBy
methods to filter unique combinations of multiple properties in C# collections.
When working with collections in C#, a common requirement is to extract unique items based on one or more properties. LINQ provides powerful methods like Distinct()
and GroupBy()
that can achieve this, but their application for multiple properties often requires a deeper understanding. This article will guide you through various techniques, including custom comparers, anonymous types, and GroupBy
clauses, to efficiently handle uniqueness across multiple fields.
Understanding Distinct() with Multiple Properties
The Distinct()
extension method, by default, uses the default equality comparer for the type of elements in the sequence. For simple types (like int
, string
), this works as expected. However, for custom objects, Distinct()
needs to know how to compare two objects for equality. If you want to consider multiple properties for uniqueness, you have a few options.
Distinct()
operates on the entire object. If you only want uniqueness based on a subset of properties, you'll need to project your data or provide a custom comparer.Method 1: Projecting to an Anonymous Type
The simplest way to get distinct combinations of multiple properties is to project your collection into an anonymous type containing only the properties you care about, and then apply Distinct()
to this new collection. LINQ automatically generates an equality comparer for anonymous types based on their properties.
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public string Category { get; set; }
public decimal Price { get; set; }
}
List<Product> products = new List<Product>
{
new Product { Id = 1, Name = "Laptop", Category = "Electronics", Price = 1200M },
new Product { Id = 2, Name = "Mouse", Category = "Electronics", Price = 25M },
new Product { Id = 3, Name = "Keyboard", Category = "Electronics", Price = 75M },
new Product { Id = 4, Name = "Laptop", Category = "Electronics", Price = 1300M }, // Different price
new Product { Id = 5, Name = "Desk", Category = "Furniture", Price = 300M },
new Product { Id = 6, Name = "Mouse", Category = "Electronics", Price = 30M } // Different price
};
// Get distinct Name and Category combinations
var distinctNameCategory = products
.Select(p => new { p.Name, p.Category })
.Distinct()
.ToList();
foreach (var item in distinctNameCategory)
{
Console.WriteLine($"Name: {item.Name}, Category: {item.Category}");
}
/* Output:
Name: Laptop, Category: Electronics
Name: Mouse, Category: Electronics
Name: Keyboard, Category: Electronics
Name: Desk, Category: Furniture
*/
Using anonymous types with Distinct()
for multiple properties.
Method 2: Implementing a Custom IEqualityComparer
For more control or when you need to reuse the comparison logic, you can implement a custom IEqualityComparer<T>
. This approach is more verbose but provides a robust and reusable solution, especially if you need to define 'equality' for your custom objects in a specific way that isn't just based on all properties.
public class ProductComparer : IEqualityComparer<Product>
{
public bool Equals(Product x, Product y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false;
// Define equality based on Name and Category
return x.Name == y.Name && x.Category == y.Category;
}
public int GetHashCode(Product obj)
{
if (ReferenceEquals(obj, null)) return 0;
// Combine hash codes of Name and Category
int hashName = obj.Name == null ? 0 : obj.Name.GetHashCode();
int hashCategory = obj.Category == null ? 0 : obj.Category.GetHashCode();
return hashName ^ hashCategory; // Simple XOR combination
}
}
// Using the custom comparer
var distinctProducts = products.Distinct(new ProductComparer()).ToList();
foreach (var product in distinctProducts)
{
Console.WriteLine($"Id: {product.Id}, Name: {product.Name}, Category: {product.Category}");
}
/* Output:
Id: 1, Name: Laptop, Category: Electronics
Id: 2, Name: Mouse, Category: Electronics
Id: 3, Name: Keyboard, Category: Electronics
Id: 5, Name: Desk, Category: Furniture
*/
Implementing IEqualityComparer<T>
for custom distinct logic.
Method 3: Using GroupBy() for Uniqueness
The GroupBy()
method is primarily used for grouping elements that share a common key. However, it can also be leveraged to achieve uniqueness. By grouping by the desired properties and then selecting the first element from each group, you effectively get a distinct set. This approach is often more intuitive for many developers when dealing with multiple properties.
// Group by Name and Category, then select the first item from each group
var distinctByGroupBy = products
.GroupBy(p => new { p.Name, p.Category })
.Select(g => g.First())
.ToList();
foreach (var product in distinctByGroupBy)
{
Console.WriteLine($"Id: {product.Id}, Name: {product.Name}, Category: {product.Category}");
}
/* Output:
Id: 1, Name: Laptop, Category: Electronics
Id: 2, Name: Mouse, Category: Electronics
Id: 3, Name: Keyboard, Category: Electronics
Id: 5, Name: Desk, Category: Furniture
*/
Achieving distinctness using GroupBy()
and selecting the first element.
flowchart TD A[Start with Collection] --> B{Project to Anonymous Type?} B -- Yes --> C[Select new { Prop1, Prop2 }] --> D[Apply .Distinct()] B -- No --> E{Need Custom Equality Logic?} E -- Yes --> F[Implement IEqualityComparer<T>] --> G[Apply .Distinct(comparer)] E -- No --> H{Want to Group and Pick First?} H -- Yes --> I[Apply .GroupBy(new { Prop1, Prop2 })] --> J[Select g.First()] H -- No --> K[Consider other LINQ methods or re-evaluate requirements] D --> L[Result: Distinct Anonymous Types] G --> M[Result: Distinct Original Objects] J --> N[Result: Distinct Original Objects] L & M & N --> O[End]
Decision flow for choosing between Distinct()
and GroupBy()
for multiple properties.
Performance Considerations
While all these methods achieve the desired result, their performance characteristics can differ, especially with large datasets.
- Anonymous Types with
Distinct()
: Generally efficient. The runtime generates an optimized equality comparer for the anonymous type. - Custom
IEqualityComparer<T>
: Can be very efficient if theEquals
andGetHashCode
methods are well-implemented. PoorGetHashCode
implementations can lead to performance degradation due to increased collisions in hash tables. GroupBy()
: Involves creating intermediate groups, which can sometimes be slightly less performant thanDistinct()
with a well-optimized comparer, especially if you only need the distinct keys and not the grouped items. However, for many scenarios, the difference is negligible andGroupBy()
can be more readable.
Distinct()
is often the most concise and performant approach. Only resort to custom comparers when you need to reuse the logic or have very specific equality rules.