Get unique 3 digit zips from a list C#
Categories:
Extracting Unique 3-Digit ZIP Codes from a List in C#

Learn how to efficiently parse a list of strings to identify and extract unique 3-digit ZIP code prefixes using C#, with a focus on string manipulation and regular expressions.
Working with geographical data often involves processing ZIP codes. A common requirement is to extract unique prefixes, such as the first three digits, which can represent larger geographical areas or distribution zones. This article will guide you through various C# techniques to achieve this, from basic string manipulation to more robust regular expression patterns, ensuring you can efficiently handle diverse input formats.
Understanding the Challenge: Diverse ZIP Code Formats
ZIP codes can appear in many forms within a dataset. You might encounter full 5-digit codes, ZIP+4 codes, or even codes embedded within longer strings. The goal is to consistently extract the initial three digits, regardless of the surrounding characters or the presence of hyphens. For example, from "12345", "12345-6789", or "Area Code 12345", we want to isolate "123".
flowchart TD A[Input List of Strings] --> B{Iterate through each string} B --> C{Isolate potential ZIP code part} C --> D{Extract first 3 digits} D --> E{Add to unique collection} E --> B B -- All strings processed --> F[Unique 3-Digit ZIPs]
Workflow for extracting unique 3-digit ZIP codes.
Method 1: String Manipulation and Substring
For relatively clean data where ZIP codes are consistently at the beginning or easily isolated, simple string manipulation methods like Substring
can be effective. This approach is straightforward but less flexible if the format varies significantly.
using System;
using System.Collections.Generic;
using System.Linq;
public class ZipCodeExtractor
{
public static HashSet<string> GetUniqueThreeDigitZipsSubstring(List<string> zipCodeStrings)
{
var uniqueZips = new HashSet<string>();
foreach (string zipString in zipCodeStrings)
{
// Assuming the ZIP code is the first 5 characters and we need the first 3
if (!string.IsNullOrWhiteSpace(zipString) && zipString.Length >= 3)
{
string threeDigitZip = zipString.Substring(0, 3);
uniqueZips.Add(threeDigitZip);
}
}
return uniqueZips;
}
public static void Main(string[] args)
{
List<string> zips = new List<string> { "12345", "12399-0001", "45678", "12300", "78910" };
HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsSubstring(zips);
Console.WriteLine("Unique 3-digit ZIP prefixes (Substring method):");
foreach (string prefix in uniquePrefixes)
{
Console.WriteLine(prefix);
}
}
}
C# code using Substring
to extract unique 3-digit ZIP prefixes.
HashSet<string>
collection is ideal for storing unique items, as it automatically handles duplicates, ensuring that each 3-digit ZIP code is stored only once.Method 2: Robust Extraction with Regular Expressions
For more complex scenarios where ZIP codes might be embedded in text, contain optional hyphens, or have varying lengths, regular expressions provide a powerful and flexible solution. We can define a pattern that specifically targets valid 5-digit or ZIP+4 codes and then extract the first three digits from the match.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class ZipCodeExtractorRegex
{
public static HashSet<string> GetUniqueThreeDigitZipsRegex(List<string> zipCodeStrings)
{
var uniqueZips = new HashSet<string>();
// Regex to match a 5-digit ZIP code, optionally followed by a hyphen and 4 digits
// The first 3 digits are captured in group 1
string pattern = @"\b(\d{3})\d{2}(?:-\d{4})?\b";
Regex regex = new Regex(pattern);
foreach (string zipString in zipCodeStrings)
{
Match match = regex.Match(zipString);
if (match.Success)
{
string threeDigitZip = match.Groups[1].Value;
uniqueZips.Add(threeDigitZip);
}
}
return uniqueZips;
}
public static void Main(string[] args)
{
List<string> zips = new List<string>
{
"The address is 12345 Main St.",
"Shipping to 12399-0001, USA",
"Customer in 45678 area",
"Another one: 12300",
"Invalid: 123",
"Valid: 78910-1234",
"No zip here"
};
HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsRegex(zips);
Console.WriteLine("Unique 3-digit ZIP prefixes (Regex method):");
foreach (string prefix in uniquePrefixes)
{
Console.WriteLine(prefix);
}
}
}
C# code using Regex
to extract unique 3-digit ZIP prefixes from various string formats.
The regex pattern \b(\d{3})\d{2}(?:-\d{4})?\b
works as follows:
\b
: Word boundary, ensures we match whole words.(\d{3})
: Captures exactly three digits (the prefix we want).\d{2}
: Matches the next two digits (completing the 5-digit code).(?:-\d{4})?
: Optionally matches a hyphen followed by four digits (non-capturing group).\b
: Another word boundary.
Combining Approaches for Best Results
For optimal performance and readability, especially with large datasets, consider a hybrid approach. You might pre-process the data to clean obvious non-ZIP code entries, then apply regular expressions for robust extraction, and finally use LINQ for concise aggregation.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class ZipCodeExtractorCombined
{
public static HashSet<string> GetUniqueThreeDigitZipsCombined(List<string> zipCodeStrings)
{
string pattern = @"\b(\d{3})\d{2}(?:-\d{4})?\b";
Regex regex = new Regex(pattern, RegexOptions.Compiled);
var uniqueZips = zipCodeStrings
.Where(s => !string.IsNullOrWhiteSpace(s))
.Select(s => regex.Match(s))
.Where(m => m.Success)
.Select(m => m.Groups[1].Value)
.ToHashSet(); // Available in .NET Core 2.0+ and .NET Framework 4.7.2+
return uniqueZips;
}
public static void Main(string[] args)
{
List<string> zips = new List<string>
{
"12345", "12399-0001", "45678", "12300", "78910",
"The address is 12345 Main St.",
"Shipping to 12399-0001, USA",
"Customer in 45678 area",
"Another one: 12300",
"Invalid: 123",
"Valid: 78910-1234",
"No zip here",
null, // Example of null entry
" " // Example of whitespace entry
};
HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsCombined(zips);
Console.WriteLine("Unique 3-digit ZIP prefixes (Combined LINQ/Regex method):");
foreach (string prefix in uniquePrefixes)
{
Console.WriteLine(prefix);
}
}
}
C# code demonstrating a combined LINQ and Regex approach for efficient extraction.
RegexOptions.Compiled
, the regex engine compiles the expression into an assembly, which can improve performance for repeated use. However, it incurs a startup cost, so it's best for patterns that will be used many times.