python pandas 3 smallest & 3 largest values
Categories:
Mastering Pandas: Extracting the N Smallest and Largest Values from DataFrames

Learn how to efficiently identify and retrieve the top N smallest and largest values or rows from a Pandas DataFrame, a crucial skill for data analysis and outlier detection.
In data analysis, it's often necessary to quickly identify the extreme values within a dataset. Whether you're looking for the highest-performing products, the lowest-scoring students, or simply understanding the range of your data, Pandas provides powerful and intuitive methods to extract the N smallest and N largest values or rows from a DataFrame. This article will guide you through the nsmallest()
and nlargest()
functions, demonstrating their usage with practical examples.
Understanding nsmallest()
and nlargest()
The nsmallest()
and nlargest()
methods are DataFrame functions designed to return the first n
rows ordered by the specified columns. They are highly optimized for performance, especially when dealing with large datasets, as they don't require sorting the entire DataFrame. These methods are particularly useful for tasks like finding outliers, identifying top/bottom performers, or filtering data based on extreme values.
flowchart TD A[Start] --> B{DataFrame Loaded?} B -- Yes --> C[Specify Column(s)] C --> D[Specify N (number of values)] D --> E{Call nsmallest() or nlargest()} E --> F[Result: DataFrame with N extreme rows] B -- No --> G[Load DataFrame] G --> A
Workflow for extracting N smallest/largest values from a DataFrame.
Basic Usage: Finding N Smallest/Largest Values in a Single Column
Let's start with a simple example. We'll create a DataFrame and then use nsmallest()
and nlargest()
to find the top 3 smallest and largest values in a specific column. The primary arguments for these functions are n
(the number of items to retrieve) and columns
(the column or list of columns to sort by).
import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'],
'Score': [85, 92, 78, 95, 88, 70, 91, 83, 90, 75],
'Age': [25, 30, 22, 28, 35, 20, 32, 26, 29, 23]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Find the 3 smallest scores
smallest_scores = df.nsmallest(3, 'Score')
print("\n3 Smallest Scores:")
print(smallest_scores)
# Find the 3 largest scores
largest_scores = df.nlargest(3, 'Score')
print("\n3 Largest Scores:")
print(largest_scores)
Example of using nsmallest()
and nlargest()
on a single column.
n
is not specified, nsmallest()
and nlargest()
default to returning the top 5 rows. Always specify n
for clarity and to avoid unexpected results.Handling Ties and Multiple Columns
What happens if there are ties in the column you're sorting by? By default, Pandas will return rows based on their original order in the DataFrame if values are equal. You can also specify multiple columns for sorting. In this case, nsmallest()
and nlargest()
will first sort by the first column, then by the second for tied values, and so on.
import pandas as pd
data_ties = {
'Product': ['A', 'B', 'C', 'D', 'E', 'F'],
'Price': [10, 20, 10, 30, 20, 10],
'Quantity': [5, 2, 8, 1, 3, 6]
}
df_ties = pd.DataFrame(data_ties)
print("Original DataFrame with Ties:")
print(df_ties)
# Find 3 smallest prices, then by quantity for ties
smallest_prices_ties = df_ties.nsmallest(3, ['Price', 'Quantity'])
print("\n3 Smallest Prices (then by Quantity for ties):")
print(smallest_prices_ties)
# Find 3 largest prices, then by quantity for ties
largest_prices_ties = df_ties.nlargest(3, ['Price', 'Quantity'])
print("\n3 Largest Prices (then by Quantity for ties):")
print(largest_prices_ties)
Using nsmallest()
and nlargest()
with multiple columns to handle ties.
keep
parameter (available in newer Pandas versions) can be used to control how ties are handled: 'first'
(default), 'last'
, or 'all'
. This allows for more granular control over tie-breaking behavior.