Holla, Welcome back to another exciting tutorial on “How to load CSV file into Pandas Data frame”. In this Python tutorial, you’ll learn the pandas read_csv method. The method read and load the CSV data into Pandas Dataframe.
You’ll also learn various optional and mandatory parameters of the pandas read_csv method syntax. In the end, you will see the live coding demo for better understanding. Let’s begin our tutorial with an introduction to the CSV file, followed by an introduction to Python Pandas and Pandas Dataframe.
Holla, Welcome back to another exciting Python tutorial on “How to load CSV file into Pandas Data frame”. In this Python tutorial, you’ll learn the pandas read_csv method. The method read and load the CSV data into Pandas Dataframe.
You’ll also learn various optional and mandatory parameters of the pandas read_csv method syntax. In the end, you will see the live coding demo for better understanding. Let’s begin our tutorial with an introduction to the CSV file, followed by introduction to Python Pandas and Pandas Dataframe.
A term CSV stands for a Comma-separated (CSV) file. It’s a text file in which each field value is delimited by “,” (comma). These files are generally used to store data into a tabular format.
You might come across these files quite frequently. CSV file format is the most widely used data exchange format. You can CSV files to import or export data from an excel sheet, database, etc. The file extension for a comma-separated file (CSV) is *.csv.
Following is an example of a loan.csv file which you will load into the panda’s data frame later in this tutorial.
1077501,1296599,5000
1077430,1314167,2500
1077175,1313524,2400
What are Delimiters in CSV file?
A delimiter is a special character that separates columns value in a dataset. The special character can be a (,) comma, (;) semi-colon, (#) hash, etc. In the above mention, loan.csv file the comma (,) is used as a delimiter.
Pandas is an open-source library written for the Python programming language. Pandas is a robust, prominent, and comprehensive data analysis library. It provides various methods such as read, writes, and dataset update methods.
It’s used for machine learning in the form of data-frames. Pandas allow various data manipulation operations such as group by, join, merge, etc.
The Data frame is an object that is useful in representing data in the form of rows and columns. Pandas data frame is generally created from .csv (comma-separated) files, Excel spreadsheets, tuples, lists.
A pandas data frame is an object, that represents data in the form of rows and columns. Python data frames are like excel worksheets or a DB2 table. A pandas data frame has an index row and a header column along with data rows. The following data frame snapshot is an illustrative picture of an excel sheet to a pandas data frame.
The programmer generally uses the panda’s library for data visualization in Python. It’s the most popular Python library. It provides various methods to import and manipulate data from different sources. The most common data exchange format is a CSV file.
Now, let’s focus on read_csv pandas, the name doesn’t do justice to functionality. Many people think that you can only read the CSV files with the read_csv pandas method. But, you can read any file that has delimiter. Example .txt file which is delimited by “,” comma.
The read_csv pandas method has 49 parameters, but all parameters are mandatory, most of them are optional. The following syntax has the least number of parameters.
# Python read_csv pandas syntax with
# minimum set of parametrs.
pd.read_csv(filepath,
sep=',',
dtype=None,
header=None,
skiprows=None,
index_col=None,
skip_blank_lines=True,
na_filter=True)
Now, let’s understand the importance of these parameters.
- filepath: The filepath parameter specifies the file location. A local file could be passed as://localhost/path/to/table.csv.
- sep: The sep parameter specifies the delimiter which is used in the file.
- dtype: The dtype parameter specifies the column datatype (i.e. integer or float).
- header: The header parameter specifies the column header row. A list of values can be used while reading a CSV file.
- skiprows: The skiprows parameter use to skip initial rows, for example, skiprows=05 means data would be read from 06th row.
- index_col: The index_col parameter use to specify the column as the row labels of the data frame.
- skip_blank_lines: The parameter is used to skip blank lines while reading data from the dataset using read_csv pandas.
- na_filter: The parameter is used to drop NaN value from the dataset.
- low_memory: Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference.
- encoding: Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
Now, let’s dirty our hands with actual code. I am using Jupyter Notebook for read_csv pandas demo. We would be using a loan.csv file, you can download a CSV file from the internet or you can use your own CSV file.
Please note, if your CSV file is in the same directory, then you are not required to specify the full path. If your file location is different then you need to specify the complete location of the file.
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd
# Reading and load loan file into df.
df_loan = pd.read_csv("loan.csv", sep=",",
encoding = "ISO-8859-1",
index_col=None,
low_memory=False,
dtype={'id':np.int32}, nrows=16, skiprows=0)
df_loan.head(3)
The pandas read_csv method code snippet, read data from loan.csv file and load into df_loan dataframe.
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd
# Reading and load loan file into df.
df_loan = pd.read_csv("loan.csv", sep=",",
encoding='utf-8',
index_col='id')
# Display first 3 rows from the pandas dataframe.
df_loan.head(3)
Note: In the above example, the ‘id’ column is set as the index of dataframe.
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd
# Reading and load loan file into df.
df_loan = pd.read_csv("loan.csv", sep=",",
encoding = "ISO-8859-1",
index_col=None,
low_memory=False)
df_loan.head(3)
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd
# Reading and load loan file into df.
df_loan = pd.read_csv("loan.csv", sep=",",
encoding = "ISO-8859-1",
index_col=None,
low_memory=False,usecols=[0,1,2,3,4,5],
dtype={'id':np.int32})
df_loan.head(3)
import numpy as np # import numpy as np
import pandas as pd # import pandas as pd
# Reading and load loan file into df.
df_loan = pd.read_csv("loan.csv", sep=",",
encoding = "ISO-8859-1",
index_col=None,
low_memory=False)
df_loan.head(2)
# Print last two rows.
df_loan.tail(2)
You can use the Python help command to get details about the syntax and possible parameters.
# Get help from python regarding read_csv syntax.
help (pd.read_csv)
# Python read_csv pandas all parameters list.
read_csv(filepath_or_buffer,
sep=',',
delimiter=None,
header='infer',
names=None,
index_col=None,
usecols=None,
squeeze=False,
prefix=None,
mangle_dupe_cols=True,
dtype=None,
engine=None,
converters=None,
true_values=None,
false_values=None,
skipinitialspace=False,
skiprows=None,
skipfooter=0,
nrows=None,
na_values=None,
keep_default_na=True,
na_filter=True,
verbose=False,
skip_blank_lines=True,
parse_dates=False,
infer_datetime_format=False,
keep_date_col=False,
date_parser=None,
dayfirst=False,
iterator=False,
chunksize=None,
compression='infer',
thousands=None,
decimal=b'.',
lineterminator=None,
quotechar='"',
quoting=0,
doublequote=True,
escapechar=None,
comment=None,
encoding=None,
dialect=None,
tupleize_cols=None,
error_bad_lines=True,
warn_bad_lines=True,
delim_whitespace=False,
low_memory=True,
memory_map=False,
float_precision=None)
This python tutorial is associated with a youtube video. Please do watch it for better understanding and practical demonstration of Python read CSV method.