What is the most efficient way of counting occurrences in pandas?
Categories:
Efficiently Counting Occurrences in Pandas DataFrames

Discover the most efficient methods for counting occurrences of values in pandas Series and DataFrames, from basic value counts to advanced grouping techniques.
Counting occurrences of values is a fundamental operation in data analysis, especially when working with categorical data or exploring data distributions. Pandas, with its powerful DataFrame and Series objects, offers several highly optimized methods to achieve this efficiently. This article will guide you through the most common and performant ways to count occurrences, helping you choose the best approach for your specific needs.
The value_counts()
Method: Your Go-To for Series
The value_counts()
method is arguably the most straightforward and frequently used function for counting unique values in a pandas Series. It returns a Series containing counts of unique values, sorted in descending order by default. It's incredibly efficient for single columns.
import pandas as pd
data = {'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'D', 'C'],
'Value': [10, 20, 10, 30, 20, 10, 40, 30]}
df = pd.DataFrame(data)
# Count occurrences in the 'Category' column
category_counts = df['Category'].value_counts()
print(category_counts)
Using value_counts()
on a single DataFrame column.
value_counts()
excludes NaN
values. To include them, set dropna=False
. You can also normalize the counts to show proportions by setting normalize=True
.Counting Across Multiple Columns with groupby()
When you need to count occurrences based on combinations of values across multiple columns, groupby()
is the method of choice. This allows for complex aggregations and provides immense flexibility. You can group by one or more columns and then apply an aggregation function like size()
or count()
.
import pandas as pd
data = {'City': ['NY', 'LA', 'NY', 'LA', 'SF', 'NY'],
'Product': ['A', 'B', 'A', 'C', 'B', 'C'],
'Sales': [100, 150, 120, 200, 180, 250]}
df = pd.DataFrame(data)
# Count occurrences of City-Product combinations
combination_counts = df.groupby(['City', 'Product']).size().reset_index(name='Count')
print(combination_counts)
Counting occurrences using groupby()
on multiple columns.
flowchart TD A[Start with DataFrame] --> B{Select Columns for Grouping} B --> C[Apply `groupby()`] C --> D[Apply `size()` or `count()`] D --> E[Optional: `reset_index()`] E --> F[Result: Counts by Group] F --> G[End]
Workflow for counting occurrences using groupby()
.
Cross-Tabulations with pd.crosstab()
For a quick and clear summary of the frequency of two (or more) factors, pd.crosstab()
is an excellent tool. It computes a frequency table of two or more factors. By default, it computes a frequency table of the factors unless an array of values and an aggregation function are passed.
import pandas as pd
data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
'Region': ['North', 'South', 'North', 'East', 'South']}
df = pd.DataFrame(data)
# Create a cross-tabulation of Gender and Region
crosstab_df = pd.crosstab(df['Gender'], df['Region'])
print(crosstab_df)
Using pd.crosstab()
for frequency tables.
pd.crosstab()
is particularly useful for categorical data and can be extended to include margins, normalization, and more complex aggregations.