Original Source Here
How to Convert a Shapefile to a DataFrame in Python
An overview of the GeoPandas Python library, with a step-by-step example
Data science application often require working with data in geographic space. Shapefiles are files that store geospatial data organized using a file-based database. Shapefiles are used by GIS professionals, local government agencies, and businesses for mapping and analysis.
In this blog post I will describe an elegant way of working with geospatial data in Python, through a practical example. I will be using GeoPandas, a Python library for working with geospatial data like plotting, analysing and mapping. GeoPandas extends the so popular Pandas library to deal with geographical data. I will also take a look at how to plot results using
GeoPandas can be installed through the following command:
pip3 install geopandas
The tutorial is organized as follows:
- Load Dataset
- Plot Data
- Operations on the Geometry
To load a geographical dataset, I can exploit the
read_file() function, which automatically detects the format of the dataset. If the file is a shapefile, I should make sure that the folder containing the shapefile also includes the .prj, .dbf, and .shx files.
In this tutorial, I exploit a dataset containing Italian points of interest, provided by Map Cruzin. This shapefile is derived from OpenStreetMap.org and is licensed under the Open Data Commons Open Database License (ODbL).
import geopandas as gpddf = gpd.read_file('../../Datasets/italy-points-shape/points.shp')
The Geometry field may contain POINTS, MULTILINES, POLYGONS and so on. The dataset may contain more than one geometry field, but only a geometry field can be set as active. This can be done through the
df = df.set_geometry('geometry')
The file is loaded as a GeoPandas dataframe. Since the GeoPandas Dataframe is a subclass of the Pandas Dataframe, I can use all the Pandas Dataframe methods with my GeoPandas Dataframe. For example, I can show the number of records through the
The dataset contains 47,427 files.
I can plot the first map, through the
plot() function provided by GeoPandas. If a file contains more than one geometry
The previous map is too small, thus it can be improved by using
matplotlib. Firstly, I can increase the figure size. I define a
subplot() with the desired size and then I pass the
ax variable to the GeoDataFrame plot:
import matplotlib.pyplot as pltfig, ax = plt.subplots(1, 1, figsize=(15, 15))
I can also change the color of the dots according to the type column. This type of plot is called a Chorophlet map. I calculate the number of different types:
There are 301 different types. To make the map more readable, I drop the types with less than 300 points.
target_types = df[‘type’].value_counts() > 300 tc = target_types[target_types == True].indexdef myfilter(x):
return x in tcdf['delete'] = df['type'].apply(lambda x: myfilter(x))
df = df[df['delete']]
Now I check the number of remaining types
There are 26 types.
Now I plot the Chorophlet map, simply by passing the
column attribute to the
plot() function. I can show the legend by setting
fig, ax = plt.subplots(1, 1, figsize=(15, 15))
df.plot(ax=ax, column='type', legend=True, cmap='viridis')
It is interesting to note that the majority of points of interest are located in North Italy.
Operations on the Geometry
GeoPandas permits to do many operations directly on the geometry field. For example, I can calculate the distance of each point from a given point, i.e. Rome, which is the Italian capital. I convert the coordinates to geometry through the
rome_longitude = [12.496365]
rome_latitude = [41.902782]
rome_point = gpd.points_from_xy(rome_longitude,rome_latitude)
Then, I calculate the distance of each point in
df from the
rome_point. I use the
distance() function, which is applied to the active geometry:
df['distance'] = df['geometry'].distance(rome_point)
I order the dataset by increasing the distance
df = df.sort_values(by='distance', ascending=True)
Finally, I select only points of interest near Rome, i.e. distance less than 0.2
df_rome = df[df['distance'] < 0.2]
Then, I plot the resulting dataframe:
fig, ax = plt.subplots(1, 1, figsize=(15, 15))
df_rome.plot(ax=ax, column='type', legend=True, cmap='viridis')
Congratulations! You have just learned how to represent geographical data in Python through GeoPandas!
You have learned how GeoPandas can be used to perform efficient operations on geodata. Although Pandas is excellent at many tasks, it is not ideal for working with geospatial data in location-aware applications. GeoPandas solves this problem by adding functionality well suited for geospatial data to Pandas.
You can download the code of this tutorial from my Github Repository.
If you have come this far to read, for me it is already a lot for today. Thanks! You can read more about me in this article.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot