Append column to pandas dataframe
Categories:
How to Append a Column to a Pandas DataFrame

Learn various methods to efficiently add new columns to your Pandas DataFrames, from simple assignments to more complex operations using apply() and insert().
Appending a new column to a Pandas DataFrame is a fundamental operation in data manipulation. Whether you're adding calculated values, external data, or simply initializing a new feature, Pandas provides several flexible and efficient ways to achieve this. This article will guide you through the most common and effective methods, helping you choose the best approach for your specific use case.
Understanding DataFrame Column Assignment
At its core, adding a column to a Pandas DataFrame often involves direct assignment. Pandas DataFrames behave much like dictionaries in this regard, where column names act as keys. If you assign a Series or an array to a new column name, Pandas automatically aligns the data by index, ensuring that values correspond correctly to existing rows. If the assigned data has a different length than the DataFrame, Pandas will typically raise an error or fill missing values with NaN
depending on the method used.
flowchart TD A[Start] --> B{New Column Data Available?} B -- Yes --> C[Choose Method: Direct Assignment, .loc, .assign(), .insert()] C --> D{Data Length Matches DataFrame Rows?} D -- Yes --> E[Assign Data to New Column Name] D -- No --> F[Handle Mismatch: Error or NaN] E --> G[New Column Added] F --> G B -- No --> H[Generate or Calculate New Column Data] H --> C G --> I[End]
Workflow for appending a column to a Pandas DataFrame.
Method 1: Direct Assignment (Simplest Approach)
The most straightforward way to add a new column is by direct assignment using square bracket notation. This method is highly intuitive and widely used for its simplicity. You can assign a scalar value, a Python list, a NumPy array, or a Pandas Series to a new column name.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
print("Original DataFrame:\n", df)
# 1. Assign a scalar value (broadcasts to all rows)
df['C'] = 10
print("\nAfter assigning scalar 'C':\n", df)
# 2. Assign a list/array (must match DataFrame length)
df['D'] = [7, 8, 9]
print("\nAfter assigning list 'D':\n", df)
# 3. Assign a Pandas Series (aligns by index)
df['E'] = pd.Series([100, 200, 300], index=[0, 1, 2])
print("\nAfter assigning Series 'E':\n", df)
# 4. Assign a Series with different index (NaN for non-matches)
df['F'] = pd.Series([111, 222], index=[0, 2])
print("\nAfter assigning Series 'F' with different index:\n", df)
Demonstrates direct assignment of various data types to new DataFrame columns.
NaN
.Method 2: Using .loc
for Assignment
The .loc
accessor is primarily used for label-based indexing and selection, but it can also be used to assign values to new columns. This method is particularly useful when you want to assign values based on specific row conditions or when you want to emphasize that you are modifying the DataFrame by label.
import pandas as pd
df = pd.DataFrame({
'A': [10, 20, 30],
'B': [40, 50, 60]
})
print("Original DataFrame:\n", df)
# Assign a new column 'G' using .loc
df.loc[:, 'G'] = [1, 2, 3]
print("\nAfter assigning 'G' with .loc:\n", df)
# Assign a new column 'H' based on a condition
df.loc[df['A'] > 15, 'H'] = 'High A'
df.loc[df['A'] <= 15, 'H'] = 'Low A'
print("\nAfter conditional assignment of 'H' with .loc:\n", df)
Using .loc
to add new columns, including conditional assignment.
Method 3: Using .assign()
(Functional Approach)
The .assign()
method is a powerful and readable way to create new columns. It returns a new DataFrame with the new columns added, leaving the original DataFrame unchanged. This makes it ideal for method chaining and functional programming paradigms, as it avoids in-place modification.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
print("Original DataFrame:\n", df)
# Assign a new column 'I' using .assign()
df_new = df.assign(I=[7, 8, 9])
print("\nAfter assigning 'I' with .assign():\n", df_new)
print("Original DataFrame (unchanged):\n", df)
# Assign multiple columns and use existing columns in calculations
df_new_calc = df.assign(
J = df['A'] * 2,
K = lambda x: x['B'] + x['J'] # 'x' refers to the DataFrame being built
)
print("\nAfter assigning multiple calculated columns with .assign():\n", df_new_calc)
Adding columns using the .assign()
method, including chained operations.
.assign()
method is particularly useful when you want to create new columns based on calculations involving existing columns, and you prefer not to modify the DataFrame in-place. It enhances code readability and maintainability.Method 4: Using .insert()
(Specific Position)
Unlike direct assignment or .assign()
, the .insert()
method allows you to add a column at a specific integer position within the DataFrame. This is useful when the order of columns is important for presentation or subsequent operations. It modifies the DataFrame in-place.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print("Original DataFrame:\n", df)
# Insert a new column 'X' at position 1 (after 'A', before 'B')
df.insert(loc=1, column='X', value=[10, 20, 30])
print("\nAfter inserting 'X' at position 1:\n", df)
# Insert another column 'Y' at the beginning (position 0)
df.insert(loc=0, column='Y', value=['foo', 'bar', 'baz'])
print("\nAfter inserting 'Y' at position 0:\n", df)
Demonstrates how to insert columns at specific positions using .insert()
.
loc
parameter in .insert()
refers to the integer index of the column before which the new column will be inserted. For example, loc=0
inserts at the very beginning, and loc=len(df.columns)
inserts at the very end.