Get unique 3 digit zips from a list C#

Learn get unique 3 digit zips from a list c# with practical examples, diagrams, and best practices. Covers c#, .net, regex development techniques with visual explanations.

Extracting Unique 3-Digit ZIP Codes from a List in C#

Hero image for Get unique 3 digit zips from a list C#

Learn how to efficiently parse a list of strings to identify and extract unique 3-digit ZIP code prefixes using C#, with a focus on string manipulation and regular expressions.

Working with geographical data often involves processing ZIP codes. A common requirement is to extract unique prefixes, such as the first three digits, which can represent larger geographical areas or distribution zones. This article will guide you through various C# techniques to achieve this, from basic string manipulation to more robust regular expression patterns, ensuring you can efficiently handle diverse input formats.

Understanding the Challenge: Diverse ZIP Code Formats

ZIP codes can appear in many forms within a dataset. You might encounter full 5-digit codes, ZIP+4 codes, or even codes embedded within longer strings. The goal is to consistently extract the initial three digits, regardless of the surrounding characters or the presence of hyphens. For example, from "12345", "12345-6789", or "Area Code 12345", we want to isolate "123".

flowchart TD
    A[Input List of Strings] --> B{Iterate through each string}
    B --> C{Isolate potential ZIP code part}
    C --> D{Extract first 3 digits}
    D --> E{Add to unique collection}
    E --> B
    B -- All strings processed --> F[Unique 3-Digit ZIPs]

Workflow for extracting unique 3-digit ZIP codes.

Method 1: String Manipulation and Substring

For relatively clean data where ZIP codes are consistently at the beginning or easily isolated, simple string manipulation methods like Substring can be effective. This approach is straightforward but less flexible if the format varies significantly.

using System;
using System.Collections.Generic;
using System.Linq;

public class ZipCodeExtractor
{
    public static HashSet<string> GetUniqueThreeDigitZipsSubstring(List<string> zipCodeStrings)
    {
        var uniqueZips = new HashSet<string>();

        foreach (string zipString in zipCodeStrings)
        {
            // Assuming the ZIP code is the first 5 characters and we need the first 3
            if (!string.IsNullOrWhiteSpace(zipString) && zipString.Length >= 3)
            {
                string threeDigitZip = zipString.Substring(0, 3);
                uniqueZips.Add(threeDigitZip);
            }
        }
        return uniqueZips;
    }

    public static void Main(string[] args)
    {
        List<string> zips = new List<string> { "12345", "12399-0001", "45678", "12300", "78910" };
        HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsSubstring(zips);

        Console.WriteLine("Unique 3-digit ZIP prefixes (Substring method):");
        foreach (string prefix in uniquePrefixes)
        {
            Console.WriteLine(prefix);
        }
    }
}

C# code using Substring to extract unique 3-digit ZIP prefixes.

Method 2: Robust Extraction with Regular Expressions

For more complex scenarios where ZIP codes might be embedded in text, contain optional hyphens, or have varying lengths, regular expressions provide a powerful and flexible solution. We can define a pattern that specifically targets valid 5-digit or ZIP+4 codes and then extract the first three digits from the match.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class ZipCodeExtractorRegex
{
    public static HashSet<string> GetUniqueThreeDigitZipsRegex(List<string> zipCodeStrings)
    {
        var uniqueZips = new HashSet<string>();
        // Regex to match a 5-digit ZIP code, optionally followed by a hyphen and 4 digits
        // The first 3 digits are captured in group 1
        string pattern = @"\b(\d{3})\d{2}(?:-\d{4})?\b";
        Regex regex = new Regex(pattern);

        foreach (string zipString in zipCodeStrings)
        {
            Match match = regex.Match(zipString);
            if (match.Success)
            {
                string threeDigitZip = match.Groups[1].Value;
                uniqueZips.Add(threeDigitZip);
            }
        }
        return uniqueZips;
    }

    public static void Main(string[] args)
    {
        List<string> zips = new List<string>
        {
            "The address is 12345 Main St.",
            "Shipping to 12399-0001, USA",
            "Customer in 45678 area",
            "Another one: 12300",
            "Invalid: 123",
            "Valid: 78910-1234",
            "No zip here"
        };
        HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsRegex(zips);

        Console.WriteLine("Unique 3-digit ZIP prefixes (Regex method):");
        foreach (string prefix in uniquePrefixes)
        {
            Console.WriteLine(prefix);
        }
    }
}

C# code using Regex to extract unique 3-digit ZIP prefixes from various string formats.

Combining Approaches for Best Results

For optimal performance and readability, especially with large datasets, consider a hybrid approach. You might pre-process the data to clean obvious non-ZIP code entries, then apply regular expressions for robust extraction, and finally use LINQ for concise aggregation.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class ZipCodeExtractorCombined
{
    public static HashSet<string> GetUniqueThreeDigitZipsCombined(List<string> zipCodeStrings)
    {
        string pattern = @"\b(\d{3})\d{2}(?:-\d{4})?\b";
        Regex regex = new Regex(pattern, RegexOptions.Compiled);

        var uniqueZips = zipCodeStrings
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .Select(s => regex.Match(s))
            .Where(m => m.Success)
            .Select(m => m.Groups[1].Value)
            .ToHashSet(); // Available in .NET Core 2.0+ and .NET Framework 4.7.2+

        return uniqueZips;
    }

    public static void Main(string[] args)
    {
        List<string> zips = new List<string>
        {
            "12345", "12399-0001", "45678", "12300", "78910",
            "The address is 12345 Main St.",
            "Shipping to 12399-0001, USA",
            "Customer in 45678 area",
            "Another one: 12300",
            "Invalid: 123",
            "Valid: 78910-1234",
            "No zip here",
            null, // Example of null entry
            "   " // Example of whitespace entry
        };
        HashSet<string> uniquePrefixes = GetUniqueThreeDigitZipsCombined(zips);

        Console.WriteLine("Unique 3-digit ZIP prefixes (Combined LINQ/Regex method):");
        foreach (string prefix in uniquePrefixes)
        {
            Console.WriteLine(prefix);
        }
    }
}

C# code demonstrating a combined LINQ and Regex approach for efficient extraction.