TCL: sort a list of strings based on a portion of the string
Categories:
TCL: Sorting a List of Strings Based on a Substring

Learn how to effectively sort a list of strings in TCL by extracting and comparing specific portions of each string, using custom comparison procedures.
Sorting lists is a fundamental operation in any programming language, and TCL provides powerful mechanisms for this. While simple lexicographical or numerical sorting is straightforward, scenarios often arise where you need to sort strings based on a specific segment or pattern within them. This article will guide you through the process of sorting a list of strings in TCL by a defined portion of each string, leveraging custom comparison functions.
Understanding TCL's lsort
Command
The lsort
command is TCL's primary tool for sorting lists. It's highly flexible, allowing you to specify various sorting options, including data types (e.g., -integer
, -real
, -dictionary
), sort order (e.g., -increasing
, -decreasing
), and crucially, a custom comparison command using the -command
option. This -command
option is what enables sorting based on complex criteria, such as a substring.
flowchart TD A[Start] --> B{Input List of Strings} B --> C{Define Custom Comparison Logic} C --> D{Extract Substring from String 1} C --> E{Extract Substring from String 2} D & E --> F{Compare Substrings} F -->|Result < 0| G[String 1 comes before String 2] F -->|Result > 0| H[String 2 comes before String 1] F -->|Result = 0| I[Order is indifferent] G & H & I --> J{lsort Applies Comparison Iteratively} J --> K[Sorted List Output] K --> L[End]
Flowchart of custom list sorting process in TCL
Implementing a Custom Comparison Procedure
To sort by a substring, you need to provide lsort
with a procedure that takes two list elements as arguments and returns -1, 0, or 1, indicating their relative order. This procedure will extract the relevant substring from each element and then compare those substrings. The string range
or regexp
commands are ideal for substring extraction.
proc compareBySubstring {a b} {
# Example: Sort by characters at index 2 to 4 (0-indexed)
set sub_a [string range $a 2 4]
set sub_b [string range $b 2 4]
# Perform a standard string comparison on the extracted substrings
return [string compare $sub_a $sub_b]
}
set my_list {
"apple_123_red"
"banana_456_yellow"
"cherry_789_green"
"date_101_brown"
"grape_234_purple"
}
puts "Original List: $my_list"
set sorted_list [lsort -command compareBySubstring $my_list]
puts "Sorted by substring (index 2-4): $sorted_list"
# Expected output (sorted by 'ple', 'nan', 'err', 'ate', 'rap'):
# Original List: {apple_123_red banana_456_yellow cherry_789_green date_101_brown grape_234_purple}
# Sorted by substring (index 2-4): {apple_123_red date_101_brown cherry_789_green grape_234_purple banana_456_yellow}
TCL code for sorting a list by a fixed-position substring.
lsort -command
.Sorting by Pattern-Based Substrings (Regular Expressions)
For more complex substring extraction, especially when the position isn't fixed, regular expressions are invaluable. The regexp
command can extract matching groups, which can then be used for comparison. This is particularly useful for data that follows a specific format but might have variable-length components.
proc compareByRegexMatch {a b} {
# Example: Sort by the number immediately following an underscore
# Pattern: _(\d+)_ (captures one or more digits between underscores)
if {![regexp {[^_]*_(\d+)_.*} $a -> match_a] || ![regexp {[^_]*_(\d+)_.*} $b -> match_b]} {
# Handle cases where the pattern isn't found (e.g., put them at the end or beginning)
# For simplicity, we'll treat them as equal here or let string compare handle it if no match
return [string compare $a $b] ;# Fallback to full string compare
}
# Convert extracted matches to integers for numerical comparison
set num_a [expr {$match_a + 0}]
set num_b [expr {$match_b + 0}]
if {$num_a < $num_b} {
return -1
} elseif {$num_a > $num_b} {
return 1
} else {
return 0
}
}
set my_list_regex {
"item_10_alpha"
"product_2_beta"
"asset_100_gamma"
"component_5_delta"
}
puts "Original List (Regex): $my_list_regex"
set sorted_list_regex [lsort -command compareByRegexMatch $my_list_regex]
puts "Sorted by regex match (number after first underscore): $sorted_list_regex"
# Expected output (sorted by 2, 5, 10, 100):
# Original List (Regex): {item_10_alpha product_2_beta asset_100_gamma component_5_delta}
# Sorted by regex match (number after first underscore): {product_2_beta component_5_delta item_10_alpha asset_100_gamma}
TCL code for sorting a list by a substring extracted using regular expressions.
regexp
for sorting, always consider how to handle strings that do not match the expected pattern. Your comparison procedure should have robust error handling or a fallback mechanism to prevent unexpected sorting behavior or errors.Performance Considerations
For very large lists, the performance of your custom comparison procedure can become a factor. Repeatedly extracting substrings or running complex regular expressions for every comparison can be slow. If performance is critical, consider pre-processing the list into a temporary structure (e.g., a list of lists or a dictionary) where each element contains both the original string and its extracted sort key. Then, sort this temporary structure and reconstruct the final list.
# Example of pre-processing for performance (conceptual)
set my_list {"apple_123_red" "banana_456_yellow"}
set temp_list {}
foreach item $my_list {
set sort_key [string range $item 2 4] ;# Or use regexp
lappend temp_list [list $sort_key $item]
}
# Now sort temp_list based on the first element (the sort_key)
set sorted_temp_list [lsort -index 0 $temp_list]
# Reconstruct the final list
set final_sorted_list {}
foreach item_pair $sorted_temp_list {
lappend final_sorted_list [lindex $item_pair 1]
}
puts "Pre-processed and sorted list: $final_sorted_list"
Conceptual code demonstrating pre-processing for performance optimization.