In Python’s Pandas, isin()
is a method used to filter a DataFrame or Series based on whether its elements are present in a list, set, or another Series. It’s commonly used for filtering rows that match specific values in columns.
Contents
Example
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
# Check if 'Name' is in the list
filtered_df = df[df['Name'].isin(['Alice', 'David'])]
print(filtered_df)
Output:
Name Age
0 Alice 25
3 David 40
What it does
The isin()
method checks whether each element in the ‘Name’ column exists in the provided list (['Alice', 'David']
). It returns a boolean mask that can be used to filter the DataFrame, keeping only the rows where the values match.
- True result: Rows where the column value is present in the provided list are returned.
- False result: Rows where the column value is not in the list are filtered out.
Examples
Example 1: Using isin()
with a list of values
import pandas as pd
# Sample DataFrame
data = {'Fruit': ['Apple', 'Banana', 'Mango', 'Orange'],
'Quantity': [5, 3, 8, 2]}
df = pd.DataFrame(data)
# Filter rows where 'Fruit' is in the list
selected_fruits = df[df['Fruit'].isin(['Apple', 'Mango'])]
print(selected_fruits)
Output:
Fruit Quantity
0 Apple 5
2 Mango 8
This filters the DataFrame to include only rows where ‘Fruit’ is either ‘Apple’ or ‘Mango’.
Example 2: Using isin()
with a Series
import pandas as pd
# Sample DataFrame
data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
'Population': [8000000, 4000000, 2700000, 2300000]}
df = pd.DataFrame(data)
# Another Series of cities
cities_to_check = pd.Series(['Chicago', 'Houston'])
# Filter using `isin()`
selected_cities = df[df['City'].isin(cities_to_check)]
print(selected_cities)
Output:
City Population
2 Chicago 2700000
3 Houston 2300000
Here, isin()
checks if each city in the ‘City’ column is present in the cities_to_check
Series.
Example 3: Inverting the filter using ~
import pandas as pd
# Sample DataFrame
data = {'Color': ['Red', 'Blue', 'Green', 'Yellow'],
'Code': [1, 2, 3, 4]}
df = pd.DataFrame(data)
# Exclude rows where 'Color' is in the list
excluded_colors = df[~df['Color'].isin(['Red', 'Green'])]
print(excluded_colors)
Output:
Color Code
1 Blue 2
3 Yellow 4
Using ~
(the NOT operator) inverts the mask, returning rows where the values are not in the provided list.
Example 4: Using isin()
with multiple columns
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
# Filter using conditions on multiple columns
selected_rows = df[df['Name'].isin(['Alice', 'David']) & df['City'].isin(['New York', 'Houston'])]
print(selected_rows)
Output:
Name Age City
0 Alice 25 New York
3 David 40 Houston
This example shows how to use isin()
on multiple columns to filter data based on complex conditions.