Insert a row to pandas dataframe

Learn insert a row to pandas dataframe with practical examples, diagrams, and best practices. Covers python, pandas, dataframe development techniques with visual explanations.

How to Insert a Row into a Pandas DataFrame

Hero image for Insert a row to pandas dataframe

Learn various methods to effectively insert new rows into a Pandas DataFrame, including using loc, concat, and _append for different scenarios.

Inserting a new row into a Pandas DataFrame is a common operation in data manipulation. While DataFrames are designed for efficient column-wise operations, adding rows can sometimes be tricky, especially when considering performance for large datasets. This article explores several robust methods to insert rows, catering to different use cases and DataFrame sizes.

Understanding DataFrame Immutability and Performance

It's crucial to understand that Pandas DataFrames are generally immutable in terms of their underlying data blocks. Operations that appear to 'modify' a DataFrame often return a new DataFrame with the changes. This is particularly true for row insertions. Repeatedly inserting single rows into a large DataFrame can be inefficient as it might involve creating many new DataFrames. For optimal performance, especially with many insertions, it's often better to collect all new rows and insert them in a single batch operation.

flowchart TD
    A[Start] --> B{Need to insert a row?}
    B -->|Yes| C{Single row or multiple rows?}
    C -->|Single| D[Use `loc` for specific index or `concat` for end]
    C -->|Multiple| E[Collect rows, then use `pd.concat` or `_append`]
    D --> F[New DataFrame created]
    E --> F
    F --> G[End]

Decision flow for inserting rows into a Pandas DataFrame

Method 1: Using loc for Specific Index Insertion

The loc accessor is primarily used for label-based indexing and selection, but it can also be used to insert a new row at a specific index. This method is straightforward for adding a single row and is quite readable. When you assign a new row using df.loc[new_index] = new_row_data, Pandas will either update an existing row if new_index already exists or insert a new row if it doesn't.

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': ['A', 'B', 'C']
})
print("Original DataFrame:\n", df)

# Insert a new row at index 1 (shifting existing rows down)
# Note: This effectively re-indexes the DataFrame
new_row_data = {'col1': 99, 'col2': 'Z'}
df.loc[1.5] = new_row_data # Use a non-integer index to insert between existing
df = df.sort_index().reset_index(drop=True)

print("\nDataFrame after inserting with loc and re-indexing:\n", df)

# Inserting at the end using loc with a new index
df.loc[len(df)] = {'col1': 100, 'col2': 'X'}
print("\nDataFrame after inserting at end with loc:\n", df)

Inserting rows using the loc accessor and re-indexing.

Method 2: Appending Rows with pd.concat()

The pd.concat() function is the most robust and recommended way to add one or more rows to a DataFrame, especially for performance. It concatenates DataFrames along a particular axis. To add rows, you create a new DataFrame (or Series converted to DataFrame) for the row(s) you want to insert and then concatenate it with your original DataFrame along axis=0.

import pandas as pd

df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': ['A', 'B', 'C']
})
print("Original DataFrame:\n", df)

# Create a new row as a Series, then convert to DataFrame
new_row_series = pd.Series({'col1': 4, 'col2': 'D'})
df_new_row = pd.DataFrame([new_row_series]) # Wrap in list to make it a row

# Concatenate the original DataFrame with the new row DataFrame
df_concat = pd.concat([df, df_new_row], ignore_index=True)
print("\nDataFrame after concatenating a single row:\n", df_concat)

# Adding multiple rows at once
new_rows_data = [
    {'col1': 5, 'col2': 'E'},
    {'col1': 6, 'col2': 'F'}
]
df_multiple_new_rows = pd.DataFrame(new_rows_data)

df_final = pd.concat([df_concat, df_multiple_new_rows], ignore_index=True)
print("\nDataFrame after concatenating multiple rows:\n", df_final)

Appending single and multiple rows using pd.concat().

Method 3: Using DataFrame._append() (Pandas 2.0+)

For Pandas versions 2.0 and later, DataFrame.append() has been deprecated in favor of DataFrame._append(). This method provides a more direct way to append another DataFrame or Series to the calling DataFrame. It's essentially a wrapper around pd.concat() for convenience.

import pandas as pd

df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': ['A', 'B', 'C']
})
print("Original DataFrame:\n", df)

# Create a new row as a Series
new_row_series = pd.Series({'col1': 4, 'col2': 'D'})

# Append the series (converted to DataFrame internally by _append)
df_appended = df._append(new_row_series, ignore_index=True)
print("\nDataFrame after appending a Series with _append:\n", df_appended)

# Append another DataFrame
new_df_to_append = pd.DataFrame([{'col1': 5, 'col2': 'E'}]
)
df_final_append = df_appended._append(new_df_to_append, ignore_index=True)
print("\nDataFrame after appending another DataFrame with _append:\n", df_final_append)

Appending rows using DataFrame._append().