Geospatial Analysis with Python

Hot routes is visualizing number of Crime Incidents on Streets with Python: A Geospatial Analysis

Crime incidents provide valuable insights into the safety of different areas within a city. In this geospatial analysis, we’ll explore how to visualize crime incidents on streets using Python, leveraging tools like Folium, GeoPandas, and Databricks Mosaic. By the end of this article, you’ll have the skills to create maps that can help you understand and communicate patterns in crime data.

Data Sources

For this geospatial analysis, I utilized publicly available datasets for New York City. The key datasets employed in this study are:

NYPD Shooting Incident Data (Historic): This dataset, hosted on the official website of the City of New York, offers a comprehensive historical record of shooting incidents within the city. It’s a valuable resource for understanding the spatial distribution of these incidents over time.

Geofabrik’s New York Data: To enrich our analysis with detailed street-level information, we utilized the “New York Points of Interest” dataset, which is part of OpenStreetMap data. This dataset is available for download from Geofabrik’s New York Data. It forms the foundation for our geospatial analysis, providing comprehensive data on streets, buildings, and various points of interest within the city.

Loaded this data to Postgres SQL database using python API. Notebook for ingesting geo-spatial data to PGIS is at — geo-spatial-processing-postgres-python/python_adb_code/geospatial_share/data_ingestion

Installing Necessary Packages

Before diving into the analysis, let’s make sure we have the necessary Python packages installed. We’ll need Databricks Mosaic for geospatial data processing, Folium for interactive maps, and GeoPandas for working with geographic data. Here’s how you can install them:

import subprocess

# List of package names you want to install
package_names = ["databricks-mosaic", "folium", "geopandas"]

# Use subprocess to run the pip install command for each package
for package_name in package_names:
    try:
        subprocess.check_call(["pip", "install", package_name])
        print(f"Successfully installed {package_name}")
    except subprocess.CalledProcessError as e:
        print(f"Error installing {package_name}: {e}")

Now that we have our tools in place, let’s move on to setting up our environment.

Setting Up Your Environment

To perform geospatial analysis and visualization in Python, we need to enable Databricks Mosaic. This environment will allow us to work with spatial data efficiently. You might have different requirements depending on your setup, but make sure you’ve enabled the necessary libraries and configurations.

Data Retrieval and Transformation

Our analysis is based on crime incident data stored in a PostgreSQL database. We’ll use SQL queries to extract the relevant data. Here’s a snippet of how we access and transform the data:

# Define the database connection URL
url = "jdbc:postgresql://postgresserver.postgres.database.azure.com:5432/postgres?tcpKeepAlive=true&prepareThreshold=-1&binaryTransfer=true&defaultRowFetchSize=10000"

# Set the database properties
db_properties = {
    "user": "<user_name>",
    "password": "<password>",
    "driver": "org.postgresql.Driver"
}

# SQL query to create the table and populate it
create_table_query = """
CREATE TABLE tbl_line_vs_shooting_point AS
SELECT
    ST_DWithin(p.geometry, l.geometry, 0.0001) AS point_intersects_line,
    ST_Distance(p.geometry, l.geometry) AS distance,
    l.st_name,
    p.boro,
    l.geometry AS line_geom,
    p.geometry AS point_geom,
    p.incident_key,
    l.r_blkfc_id
FROM
    geodata_nypd_shooting_json p
JOIN
    geodata_nyc_street_json l
ON
    ST_DWithin(p.geometry, l.geometry, 0.0001)
"""

# Execute the create table query
spark.read.jdbc(url, "({}) AS temp".format(create_table_query), properties=db_properties)

# SQL query to select all records from the created table
select_query = "SELECT * FROM tbl_line_vs_shooting_point"

# Execute the select query
result_df = spark.read.jdbc(url, "({}) AS temp".format(select_query), properties=db_properties)

# Show the result DataFrame
result_df.show()

With our data in hand, we can proceed to analyze it.

Analyzing the Data

We start by counting the number of crime incidents per street or line geometry. This step allows us to identify streets with multiple incidents. Here’s how we do it:

# Count the number of crime incidents per street or line geometry
counts_by_seg = pandas_df.groupby(['merged_line', 'incident_key']).size().reset_index(name='count')

# Select records where the number of incidents is more than 1
counts_by_seg2 = counts_by_seg[counts_by_seg['count'] > 1]

# Summary/distribution of data
summary_stats = counts_by_seg2['count'].describe()
print(summary_stats)

We’ve now prepared our data for visualization. Let’s proceed to create a visually appealing map.

Visualizing the Data

We’ll use the Folium library to create an interactive map that displays streets with multiple crime incidents. Additionally, we’ll use a color palette to emphasize the intensity of incidents on each street:

import folium
from branca.colormap import linear

# Create a color palette
color_palette = linear.Reds_03.scale(0, 9)
color_palette.caption = 'crime_incidents'

# Create a map using the CartoDB.DarkMatter tiles
m = folium.Map(location=[40.7128, -74.0060], zoom_start=12)

# Add LineStrings to the map
for _, row in counts_by_seg2.iterrows():
    line_coords = row["merged_line"]
    wkt_line = wkt.loads(line_coords)
    line_coords = list(wkt_line.coords)
    reversed_coordinates = [(lat, lon) for lon, lat in line_coords]
    folium.PolyLine(locations=reversed_coordinates, color=color_palette(row['count']), opacity=1, weight=5).add_to(m)

# Display the map
display(m)

# Save the map as an HTML file
m.save('/path/to/map_with_crime_incidents.html')