The str.contains() method in Pandas is used to test if a pattern or regex is contained within a string of a Series.
Example
import pandas as pd
# create a pandas Series
cities = pd.Series(['New York', 'London', 'Tokyo', 'Paris', 'Moscow'])
# use contains() to check which city names contain the substring 'o'
contains_o = cities.str.contains('o')
print(contains_o)
'''
Output
0 True
1 True
2 True
3 False
4 True
dtype: bool
'''
str.contains() Syntax
The syntax of the str.contains() method in Pandas is:
Series.str.contains(pat, case=True, na=nan, regex=True)
str.contains() Arguments
The str.contains() method takes following arguments:
pat- string pattern or regular expression we are looking for within each element of the Seriescase(optional) - specifies whether to perform case-sensitive or case-insensitive matchingna(optional) - a fill value for missing valuesregex(optional) - specifies whether to assume the pattern as a regular expression or not
str.contains() Return Value
The str.contains() method returns a Boolean Series showing whether each element in the Series contains the pattern or regex.
Example1: Check Which Series Elements Contain Given Substring
import pandas as pd
# create a Series
data = pd.Series(['apple', 'banana', 'cherry', 'date'])
# use contains() to check which elements contain the substring 'a'
contains_a = data.str.contains('a')
print(contains_a)
Output
0 True 1 True 2 False 3 True dtype: bool
In the above example, we first created the data Series with fruit names.
Then, we used the str.contains() method to check which elements in the Series contain the substring a.
The result is a Series of Boolean values (True or False), indicating whether each element in data contains a.
Example 2: Case-Sensitive and Case-Insensitive Searches with case Parameter
import pandas as pd
# create a Series
data = pd.Series(['Apple', 'banana', 'Cherry', 'Date', 'APRICOT'])
# case-sensitive search (default behavior)
case_sensitive_result = data.str.contains('a')
# case-insensitive search
case_insensitive_result = data.str.contains('a', case=False)
print("Case-sensitive search:\n", case_sensitive_result)
print("\nCase-insensitive search:\n", case_insensitive_result)
Output
Case-sensitive search: 0 False 1 True 2 False 3 True 4 False dtype: bool Case-insensitive search: 0 True 1 True 2 False 3 True 4 True dtype: bool
Here,
data.str.contains('a')- only returnsTruefor elements whereaappears in the exact case specified (lowercasea).data.str.contains('a', case=False)- ignores the case ofa, thus matching bothaandAin any element of the data Series.
Example 3: Handle Missing Data with na Parameter in Pandas str.contains()
import pandas as pd
# create a Series with missing values
data = pd.Series(['apple', 'banana', None, 'cherry', None, 'date'])
# check which elements contain 'a', treating missing values as False
result_with_na_false = data.str.contains('a', na=False)
# check which elements contain 'a', treating missing values as True
result_with_na_true = data.str.contains('a', na=True)
print("With na=False:\n", result_with_na_false)
print("\nWith na=True:\n", result_with_na_true)
Output
With na=False: 0 True 1 True 2 False 3 False 4 False 5 True dtype: bool With na=True: 0 True 1 True 2 True 3 False 4 True 5 True dtype: bool
In this example, when
na=False, missing valuesNonein the Series results inFalsein the output result_with_na_false Series.na=True, missing values in the Series result inTruein the output result_with_na_true Series.
Example 4: Using Regular Expression in str.contains()
import pandas as pd
# create a Series
data = pd.Series(['Apple123', 'banana', 'Cherry', 'Date', 'XYZ', '12345', 'abc'])
# regular expression to find strings containing digits or a, b, c (case insensitive)
regex_pattern = '[0-9abcABC]'
# use str.contains() with the regex pattern
result = data.str.contains(regex_pattern, regex=True)
print(result)
Output
0 True 1 True 2 True 3 True 4 False 5 True 6 True dtype: bool
In the above example, the regex pattern [0-9abcABC] looks for any character that is either a digit from 0 to 9 or one of the letters a, b, or c in either upper or lower case.
And the str.contains() method with regex=True is used to apply this pattern to each element in the data Series.
Hence, the result Series will contain True for elements that match the pattern and False for those that don't.
Note: To learn more about Regular Expressions, please visit Python RegEx.