How to load files into Python with pandas read_csv()
Python pandas read_csv()
is one of the most commonly used methods to read CSV files into pandas and store them as DataFrames. CSV files (comma-separated values) are a widely used format for storing tabular data and are supported by many applications.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for Python pandas read_csv()
?
pandas.read_csv()
creates a pandas DataFrame from a CSV file. The basic syntax of the function looks like this:
import pandas as pd
df = pd.read_csv(filepath_or_buffer, sep=',', header='infer', names=None, index_col=None, usecols=None, dtype=None, ...)
pythonWhat are the most important parameters for pandas.read_csv()
?
pandas.read_csv()
can accept a wide variety of parameters. To keep things simple, we’ll focus on the most important arguments. Here’s an overview of the key parameters you can use to specify how the function should behave:
Parameter | Meaning | Default Value |
---|---|---|
filepath_or_buffer
|
This is a Python string representing the path to the CSV file or a data buffer, such as a URL | |
sep
|
This specifies the delimiter between values. | ,
|
header
|
Indicates which row to use as the header. | infer (first row)
|
names
|
If header=None is set, you can use names to provide a Python list of column names.
|
|
index_col
|
Determines which column to use as the index. | None
|
usecols
|
This parameter allows you to select which columns you want to load into the DataFrame. | None
|
dtype
|
Specifies the data type of the columns. | None
|
You can find a comprehensive list of the parameters for this function in the pandas documentation.
How to access CSV files step by step
Using pandas.read_csv()
, you can easily transfer data from CSV files into Python in just a few steps.
In the following examples, we’ll be working with a CSV file that’s structured like this:
1,John Avery,35,Nottingham,50000
2,Adelaide Smith,29,London,62000
3,Michael Rivera,41,Cardiff,40000
4,Grace Kim,33,Hull,35000
5,Tyler Johnson,28,Kent,52000
Step 1: Import pandas
First, import the pandas library into your Python script.
import pandas as pd
pythonStep 2: Load the CSV file
Now, you can load your CSV file to Python pandas using the read_csv()
function. Simply pass the filepath to the function. In the following code, we’ll use a file named data.csv, which is saved in the same directory as the script:
df = pd.read_csv('data.csv')
pythonThe code above stores the file in a DataFrame object (df
), which we’ll then be able to work with. Pandas will automatically interpret the first row as column headers unless you specify otherwise.
Step 3: Display the CSV file
It’s a good idea to take a look at the first few rows of the DataFrame to make sure the file has been loaded correctly. You can use the DataFrame.head()
function for this. By default, it shows the first five rows of the DataFrame, giving you a quick overview of the data’s structure:
print(df.head())
pythonThe output looks like this:
0 1 John Avery 35 Nottingham 50000
1 2 Adelaide Smith 29 London 62000
2 3 Michael Rivera 41 Cardiff 40000
3 4 Grace Kim 33 Hull 35000
4 5 Tyler Johnson 28 Kent 52000
Step 4: Change the column names (optional)
If your CSV file doesn’t have a header row, you can define the column names manually:
df = pd.read_csv('data.csv', header=None, names=['ID', 'Name', 'Age', 'City', 'Salary'])
pythonIn this example, we’ve named the columns ID, Name, Age, City and Salary. The output looks like this:
ID Name Age City Salary
0 1 John Avery 35 Nottingham 50000
1 2 Adelaide Smith 29 London 62000
2 3 Michael Rivera 41 Cardiff 40000
3 4 Grace Kim 33 Hull 35000
4 5 Tyler Johnson 28 Kent 52000
In the example we used, there was a small amount of data, making it simple to manage. However, if you have a large CSV file, it’s a good idea to read it into pandas in chunks to avoid memory issues. You can use the pandas.read_csv()
parameter chunksize
to specify how many rows to read at a time. Using a Python for loop, you can iterate over the chunks.