python pandas 3 smallest & 3 largest values

Learn python pandas 3 smallest & 3 largest values with practical examples, diagrams, and best practices. Covers python, pandas, dataframe development techniques with visual explanations.

Mastering Pandas: Extracting the N Smallest and Largest Values from DataFrames

Hero image for python pandas 3 smallest & 3 largest values

Learn how to efficiently identify and retrieve the top N smallest and largest values or rows from a Pandas DataFrame, a crucial skill for data analysis and outlier detection.

In data analysis, it's often necessary to quickly identify the extreme values within a dataset. Whether you're looking for the highest-performing products, the lowest-scoring students, or simply understanding the range of your data, Pandas provides powerful and intuitive methods to extract the N smallest and N largest values or rows from a DataFrame. This article will guide you through the nsmallest() and nlargest() functions, demonstrating their usage with practical examples.

Understanding nsmallest() and nlargest()

The nsmallest() and nlargest() methods are DataFrame functions designed to return the first n rows ordered by the specified columns. They are highly optimized for performance, especially when dealing with large datasets, as they don't require sorting the entire DataFrame. These methods are particularly useful for tasks like finding outliers, identifying top/bottom performers, or filtering data based on extreme values.

flowchart TD
    A[Start] --> B{DataFrame Loaded?}
    B -- Yes --> C[Specify Column(s)]
    C --> D[Specify N (number of values)]
    D --> E{Call nsmallest() or nlargest()}
    E --> F[Result: DataFrame with N extreme rows]
    B -- No --> G[Load DataFrame]
    G --> A

Workflow for extracting N smallest/largest values from a DataFrame.

Basic Usage: Finding N Smallest/Largest Values in a Single Column

Let's start with a simple example. We'll create a DataFrame and then use nsmallest() and nlargest() to find the top 3 smallest and largest values in a specific column. The primary arguments for these functions are n (the number of items to retrieve) and columns (the column or list of columns to sort by).

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'],
    'Score': [85, 92, 78, 95, 88, 70, 91, 83, 90, 75],
    'Age': [25, 30, 22, 28, 35, 20, 32, 26, 29, 23]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Find the 3 smallest scores
smallest_scores = df.nsmallest(3, 'Score')
print("\n3 Smallest Scores:")
print(smallest_scores)

# Find the 3 largest scores
largest_scores = df.nlargest(3, 'Score')
print("\n3 Largest Scores:")
print(largest_scores)

Example of using nsmallest() and nlargest() on a single column.

Handling Ties and Multiple Columns

What happens if there are ties in the column you're sorting by? By default, Pandas will return rows based on their original order in the DataFrame if values are equal. You can also specify multiple columns for sorting. In this case, nsmallest() and nlargest() will first sort by the first column, then by the second for tied values, and so on.

import pandas as pd

data_ties = {
    'Product': ['A', 'B', 'C', 'D', 'E', 'F'],
    'Price': [10, 20, 10, 30, 20, 10],
    'Quantity': [5, 2, 8, 1, 3, 6]
}
df_ties = pd.DataFrame(data_ties)

print("Original DataFrame with Ties:")
print(df_ties)

# Find 3 smallest prices, then by quantity for ties
smallest_prices_ties = df_ties.nsmallest(3, ['Price', 'Quantity'])
print("\n3 Smallest Prices (then by Quantity for ties):")
print(smallest_prices_ties)

# Find 3 largest prices, then by quantity for ties
largest_prices_ties = df_ties.nlargest(3, ['Price', 'Quantity'])
print("\n3 Largest Prices (then by Quantity for ties):")
print(largest_prices_ties)

Using nsmallest() and nlargest() with multiple columns to handle ties.