How to filter for distinct values with pandas DataFrame[].unique()
In Python pandas, you can use the unique()
function to identify unique values in a column of a DataFrame. This makes it easy to get a quick overview of the different values within your dataset.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax of pandas DataFrame[].unique()
?
The basic syntax for using pandas unique()
is simple. This is because the function doesn’t take any parameters:
DataFrame['column_name'].unique()
pythonKeep in mind that unique()
can only be applied to one column. Before calling the function, you’ll need to indicate which column you want to evaluate. The unique()
function returns a numpy array containing all the different values in the order they appear, with duplicate values in the column removed. It doesn’t, however, sort the values.
If you’ve been working with Python for a while, you may be familiar with the numpy equivalent to pandas unique()
. For efficiency reasons, the pandas version is generally preferable.
How to use pandas DataFrame[].unique()
To use unique()
in a pandas DataFrame, you need to first specify the column you want to check. In the following example, we’ll use a DataFrame that contains information about the age and city of residence of a group of individuals.
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'Age': [24, 27, 22, 32, 29],
'City': ['Newcastle', 'London', 'Newcastle', 'Cardiff', 'London']
}
df = pd.DataFrame(data)
print(df)
pythonThe resulting DataFrame looks like this:
Name Age City
0 Alice 24 Newcastle
1 Bob 27 London
2 Charlie 22 Newcastle
3 David 32 Cardiff
4 Edward 29 London
Now, let’s say we want to create a list of all the cities where the people in the DataFrame live. We can apply the pandas unique()
function to the column that contains the cities.
# Find different cities
unique_cities = df['City'].unique()
print(unique_cities)
pythonThe output is a numpy array that lists each city once, showing that the individuals in the DataFrame are from a total of three cities: Newcastle, London and Cardiff.
['Newcastle' 'London' 'Cardiff']