Count frequency of words in a list and sort by frequency
Categories:
Counting Word Frequencies in Python Lists and Sorting by Occurrence

Learn various Python techniques to efficiently count the frequency of words in a list and sort them from most to least frequent. This guide covers built-in modules, dictionary manipulation, and clear examples.
Counting word frequencies is a common task in data analysis, natural language processing, and general programming. Whether you're analyzing text data, preparing for an interview, or simply organizing information, understanding how to efficiently count and sort word occurrences in a Python list is a valuable skill. This article will explore several methods, ranging from basic Python constructs to specialized modules, demonstrating how to achieve this with clarity and efficiency.
Understanding the Problem: Counting and Sorting
The core problem involves two main parts:
- Counting: Iterating through a list of words and keeping track of how many times each unique word appears.
- Sorting: Arranging these unique words based on their counts, typically in descending order (most frequent first). If two words have the same frequency, their relative order might be secondary, but often alphabetical order is preferred as a tie-breaker.
flowchart TD A[Start with a List of Words] --> B{Initialize Counter} B --> C{Iterate through each Word} C --> D{Is Word in Counter?} D -- Yes --> E[Increment Word Count] D -- No --> F[Add Word with Count 1] E --> C F --> C C -- All Words Processed --> G[Convert Counts to Sortable Format] G --> H{Sort by Count (Descending)} H --> I[Output Sorted Words and Frequencies] I --> J[End]
Flowchart illustrating the process of counting and sorting word frequencies.
Method 1: Using a Dictionary for Manual Counting
The most fundamental approach involves using a dictionary. Dictionaries are perfect for this task because they map unique keys (our words) to values (their counts). You iterate through the list, and for each word, you check if it's already in the dictionary. If it is, you increment its count; otherwise, you add it with a count of 1.
words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
word_counts = {}
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
# Sorting the dictionary items by count
sorted_word_counts = sorted(word_counts.items(), key=lambda item: item[1], reverse=True)
print(sorted_word_counts)
Counting and sorting word frequencies using a dictionary.
dict.get(key, default_value)
method is very useful here. It allows you to retrieve a value for a key, but if the key doesn't exist, it returns a specified default value (in this case, 0
), preventing KeyError
.Method 2: Leveraging collections.Counter
Python's collections
module provides a specialized dictionary subclass called Counter
that is designed precisely for this type of counting task. It's highly efficient and concise, making it the preferred method for frequency counting in most Pythonic scenarios.
from collections import Counter
words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
# Count word frequencies
word_counts = Counter(words)
# Get items sorted by frequency (most_common returns sorted list of tuples)
sorted_word_counts = word_counts.most_common()
print(sorted_word_counts)
Using collections.Counter
for efficient word frequency counting and sorting.
Counter
objects have a most_common(n)
method that returns a list of the n
most common elements and their counts, from the most common to the least. If n
is omitted or None
, it returns all elements.Method 3: Using pandas
for Data Analysis (Advanced)
For larger datasets or when integrating with other data analysis workflows, the pandas
library offers powerful tools. While it might be overkill for a simple list, it demonstrates a robust approach for more complex scenarios.
import pandas as pd
words = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
# Convert list to a pandas Series
word_series = pd.Series(words)
# Count value occurrences and sort
word_counts = word_series.value_counts()
# Convert to a list of tuples if needed
sorted_word_counts = list(word_counts.items())
print(word_counts)
print(sorted_word_counts)
Counting and sorting word frequencies using pandas value_counts()
.
Each method offers a different balance of conciseness, performance, and flexibility. For most simple cases, collections.Counter
is the idiomatic Python choice. For more control or integration into larger data pipelines, manual dictionary manipulation or pandas
might be more appropriate.