import pandas as pd
import geopandas as gp # Useful for manipulating geographic data in dataframes
import mapclassify # Geopandas uses these to produce plots on Open Street Maps
import folium # Geopandas uses these to produce plots on Open Street Maps
Ottawa Police Open Data
Data Provider
Ottawa Police’s Community Safety Data Portal is an open-access data portal designed to provide transparent and accessible information about police activity to the public. Data is available in tabular and geographic forms, and can be freely downloaded and interacted with. All data from this portal is subject to Open Data Licence Version 2.0. Information released under this licence may be copied, modified, published, adapted, or otherwise used for any lawful purpose, as long as a data attibution from the information provider is included. When using information from multiple providers, the following attribution must be used: Contains information licensed under the Open Government Licence – Ottawa Police Service.
Plotting Geographical Data Points on Open Street Maps
library(leaflet)
library(sf)
library(tidyverse)
A word about retrieving Ottawa Police Data.
Ottawa Police datasets are available to download as CSVs, however the download URL is generated on demand and expires shortly, within a few days or sometimes even hours. Also, the availability of dataset csv download is somewhat tenuous, and you may find that particular dataset is unavailable for download at a given time. For this reason, it is recommended that the dataset of interest be downloaded into a working directory. It can then be read using read_csv directly from that file. Errors may occur if a download url is passed directly to read_csv.
Shootings
#https://data.ottawapolice.ca/datasets/aeb09d77912246139014516c8eeefcd9_0/about
= pd.read_csv("Shootings_Open_Data_3086844035331583488.csv")
shootings
shootings.head()
OBJECTID ID ... x y
0 1 5825 ... 370825.7260 5.025379e+06
1 2 8759 ... 370838.8789 5.026881e+06
2 3 7144 ... 371033.7260 5.026355e+06
3 4 9156 ... 371101.1351 5.026191e+06
4 5 7144 ... 371033.7260 5.026355e+06
[5 rows x 20 columns]
The location of each reported shooting is given in the EPSG:2951 coordinate system, represented in the X and Y columns. In order to plot these points on Open Street Maps, we pass the pandas dataframe to a Geopandas dataframe, which is better suited for geographical data. The Geopandas explore function usings libraries folium and mapclassify to plot on open street maps.
= gp.GeoDataFrame(
geo_df =gp.points_from_xy(shootings["x"], shootings["y"]), crs="epsg:2951")
shootings, geometry
geo_df.explore()
#https://data.ottawapolice.ca/datasets/aeb09d77912246139014516c8eeefcd9_0/about
<- "Shootings_Open_Data_3086844035331583488.csv"
csv_link
<- read_csv(csv_link) shootings
Rows: 413 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Reported Date, Occurred Date, Time of Day, Weekday, Neighbourhood,...
dbl (11): OBJECTID, ID, Reported Hour, Reported Year, Occurred Hour, Occurre...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(shootings)
# A tibble: 6 × 20
OBJECTID ID `Reported Date` `Reported Hour` `Reported Year` `Occurred Date`
<dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 1 5825 7/5/2022 5:00:… 1200 2022 7/5/2022 5:00:…
2 2 8759 4/4/2022 5:00:… 1600 2022 4/4/2022 5:00:…
3 3 7144 6/6/2021 5:00:… 200 2021 6/6/2021 5:00:…
4 4 9156 5/22/2018 5:00… 100 2018 5/21/2018 5:00…
5 5 7144 1/16/2018 5:00… 1600 2018 1/16/2018 5:00…
6 6 7144 6/23/2021 5:00… 200 2021 6/23/2021 5:00…
# ℹ 14 more variables: `Occurred Hour` <dbl>, `Occurred Year` <dbl>,
# `Time of Day` <chr>, Weekday <chr>, `Day of Week` <dbl>,
# Neighbourhood <chr>, Sector <dbl>, Division <chr>, Ward <chr>,
# Councillor <chr>, `Census Tract` <dbl>, `Level of Injury` <chr>, x <dbl>,
# y <dbl>
The location of each reported shooting is given in the EPSG:2951 coordinate system, represented in the X and Y columns. In order to plot these points on Open Street Maps using the [Leaflet] (https://rstudio.github.io/leaflet/) library, we will need to convert to Latitude and Longitude, also known as the EPSG:4386 coordinate system. R’s Simple Features (sf) library can be used to manipulate spatial data. [Simple Features] (https://cran.r-project.org/web/packages/sf/index.html).
<- st_as_sf(shootings, coords = c("x", "y"), crs = 2951) |> # Tell R that the X and Y columns represent location coordinates in the EPSG:2951 system
shootingslatlong st_transform(crs = 4386 ) |> # Transform the X and Y location data into the EPSG:4386 system, aka latitude and longitude.
st_coordinates() # Extract the latitude and longitude data into a new dataframe that we will plot
leaflet() |>
addTiles() |>
addMarkers(data = shootingslatlong)
Bike Thefts
#https://data.ottawapolice.ca/datasets/eff8a6410ec74136b5f611017e244a4e_0/about
= pd.read_csv("Bike_Theft_Open_Data_3884822422354997826.csv")
bike_thefts
bike_thefts.head()
OBJECTID ID Year ... Occurred Hour x y
0 1 2959 2018 ... 1900.0 368056.8695 5.031032e+06
1 2 5058 2018 ... 600.0 367201.7041 5.030721e+06
2 3 10999 2018 ... 1700.0 368325.7049 5.032417e+06
3 4 11000 2018 ... 1700.0 368325.7049 5.032417e+06
4 5 10998 2018 ... 800.0 368325.7049 5.032417e+06
[5 rows x 28 columns]
= gp.GeoDataFrame(
geo_df =gp.points_from_xy(bike_thefts["x"], bike_thefts["y"]), crs="epsg:2951")
bike_thefts, geometry
"Bicycle Value"] > 2000].explore() geo_df[geo_df[
#https://data.ottawapolice.ca/datasets/eff8a6410ec74136b5f611017e244a4e_0/about
<- "Bike_Theft_Open_Data_3884822422354997826.csv"
csv_link
<- read_csv(csv_link) bike_thefts
Rows: 12130 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Reported Date, Occurred Date, Offence Category, Bicycle Style, Bic...
dbl (10): OBJECTID, ID, Year, Bicycle Value, Bicycle Speed, Census Tract, Re...
lgl (2): Day of Week, Bicycle Frame
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- bike_thefts |> rename_all(make.names)
bike_thefts
head(bike_thefts)
# A tibble: 6 × 28
OBJECTID ID Year Reported.Date Occurred.Date Day.of.Week Offence.Category
<dbl> <dbl> <dbl> <chr> <chr> <lgl> <chr>
1 1 2959 2018 1/5/2018 5:00… 1/4/2018 5:0… NA Theft< Bicycle
2 2 5058 2018 1/8/2018 5:00… 1/8/2018 5:0… NA Prop Found/ Rec
3 3 10999 2018 1/22/2018 5:0… 1/22/2018 5:… NA Theft< Bicycle
4 4 11000 2018 1/22/2018 5:0… 1/22/2018 5:… NA Theft< Bicycle
5 5 10998 2018 1/23/2018 5:0… 1/19/2018 5:… NA Theft< Bicycle
6 6 3248 2018 1/23/2018 5:0… 1/21/2018 5:… NA Theft< Bicycle
# ℹ 21 more variables: Bicycle.Style <chr>, Bicycle.Value <dbl>,
# Bicycle.Make <chr>, Bicycle.Model <chr>, Bicycle.Type <chr>,
# Bicycle.Frame <lgl>, Bicycle.Colour <chr>, Bicycle.Speed <dbl>,
# Neighbourhood <chr>, Sector <chr>, Division <chr>, Census.Tract <dbl>,
# Status <chr>, Intersection <chr>, Time.of.Day <chr>, Ward <chr>,
# Councillor <chr>, Reported.Hour <dbl>, Occurred.Hour <dbl>, x <dbl>,
# y <dbl>
<- st_as_sf(bike_thefts |> subset(Bicycle.Value > 2000), coords = c("x", "y"), crs = 2951) |>
expensive_bike_theftslatlong st_transform(crs = 4386 ) |>
st_coordinates()
leaflet() |>
addTiles() |>
addMarkers(data = expensive_bike_theftslatlong)
Motor Vehicle Thefts
#https://data.ottawapolice.ca/datasets/e5453a203e2a4baeac32ac4e70ce852c_0/about
= pd.read_csv("Auto_Theft_Open_Data_2158590371504723284.csv")
vehicle_theft
vehicle_theft.head()
OBJECTID Vehicle Year ... x y
0 1 NaN ... 367986.8978 5.031477e+06
1 2 2017.0 ... 381704.9517 5.032668e+06
2 3 NaN ... 350914.7854 5.013093e+06
3 4 2004.0 ... 365187.1427 5.014770e+06
4 5 2012.0 ... 360961.8697 5.020792e+06
[5 rows x 24 columns]
"Vehicle Make"].unique() vehicle_theft[
array([nan, 'JEEP', 'INFI', 'DODG', 'TOYT', 'HYUN', 'HOND', 'SUBA', 'GMC',
'FORD', 'MITS', 'MINI', 'LEXS', 'CHEV', 'NISS', 'MERZ', 'AUDI',
'KUBO', 'KIA', 'MAZD', 'STOU', 'BMW', 'PONT', 'VOLK', 'CATE',
'ISU', 'BUIC', 'MALI', 'INTI', 'VOLV', 'CADI', 'SUZI', 'DEER',
'WILD', 'BOCT', 'ACUR', 'KAWK', 'CHRY', 'HSQV', 'CASE', 'ASTR',
'CUB', 'GM', 'FRHT', 'RAZR', 'JDJ', 'AUTU', 'HMDE', 'YAMA', 'HINO',
'PONI', 'BOMB', 'STEA', 'LNDR', 'JAGU', 'INTL', 'KRT', 'NITR',
'TRL', 'RAM', 'STRN', 'POLS', 'SATR', 'TOWR', 'KOMA', 'CAME',
'SCIO', 'MISK', 'LINC', 'VECO', 'CSHO', 'OLDS', 'SUNB', 'MANA',
'GIAN', 'PORS', 'CUBE', 'PLYM', 'UHAU', 'FORT', 'INTE', 'GIO',
'CANA', 'MERC', 'NEWH', 'TORO', 'PIAG', 'JAWA', 'FIAT', 'TREX',
'SNOW', 'EMMO', 'POLA', 'SCOT', 'NOTH', 'ARCT', 'KUGO', 'CARG',
'GLEN', 'MOSC', 'TOYO', 'KYMC', 'EBIK', 'VECT', 'NINE', 'SMAR',
'ARTC', 'MONA', 'EMO', 'GOTR', 'AURI', 'FREE', 'JOHN', 'LOAD',
'SMRT', 'LUND', 'DAYM', 'TAO', 'BIRD', '2010', 'GATO', 'DERA',
'PEDA', 'MNNI', 'TESL', 'CFMO', 'TRA', 'MISS', 'WEBE', 'HOME',
'META', 'AMGE', 'SAA', 'GEN', 'STIR', 'HUMM', 'EZHL', 'HD', 'NCM',
'SEGW', 'CROW', 'BRI', 'TRIT', 'PEGA', 'ZERO', 'CAT', 'BEAR',
'BIKT', 'STER', 'NIU', 'MAXI', 'CRV', 'POLE', 'DODD', 'LAND',
'MACK', 'SHOR', 'GEO', 'MOTO', 'TOY', 'MOFF', 'SGAC', 'PEAC',
'NEW', 'NEO', 'TERE', 'HOMD', 'NEXH', 'EASY', 'KTRL', 'WACK',
'THOR', 'PTRB', 'CAMI', 'ETRI', 'AMGN', 'KARA', 'KTR'],
dtype=object)
= gp.GeoDataFrame(
vehicle_theft_geo =gp.points_from_xy(vehicle_theft["x"], vehicle_theft["y"]), crs="epsg:2951")
vehicle_theft, geometry
"Vehicle Make"] == "VOLK"].explore() vehicle_theft_geo[vehicle_theft_geo[
#https://data.ottawapolice.ca/datasets/e5453a203e2a4baeac32ac4e70ce852c_0/about
<- "Auto_Theft_Open_Data_2158590371504723284.csv"
csv_link
<- read_csv(csv_link) vehicle_theft
Rows: 7865 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Vehicle Make, Vehicle Model, Vehicle Style, Vehicle Colour, Weekda...
dbl (9): OBJECTID, Vehicle Year, Vehicle Value, Year, Census Tract, Reporte...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- vehicle_theft |> rename_all(make.names)
vehicle_theft
head(vehicle_theft)
# A tibble: 6 × 24
OBJECTID Vehicle.Year Vehicle.Make Vehicle.Model Vehicle.Style Vehicle.Colour
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 NA <NA> <NA> <NA> BLK
2 2 2017 JEEP WRG CARRY BLK
3 3 NA <NA> <NA> MOTO BLK
4 4 2004 INFI G35 4DR BLK
5 5 2012 DODG R15 PICKUP RED
6 6 2000 TOYT CAM 4DR BRN
# ℹ 18 more variables: Vehicle.Value <dbl>, Weekday <chr>, Recovered <chr>,
# Neighbourhood <chr>, Ward <chr>, Sector <chr>, Reported.Date <chr>,
# Occurred.Date <chr>, Year <dbl>, Intersection <chr>, Division <chr>,
# Census.Tract <dbl>, Time.of.Day <chr>, Councillor <chr>,
# Reported.Hour <dbl>, Occurred.Hour <dbl>, x <dbl>, y <dbl>
<- st_as_sf(vehicle_theft |> subset(Vehicle.Make == "VOLK"), coords = c("x", "y"), crs = 2951) |>
stolen_volk_latlong st_transform(crs = 4386 ) |>
st_coordinates()
leaflet() |>
addTiles() |>
addMarkers(data = stolen_volk_latlong)
Historical Crime Data 2013 - 2022
#https://data.ottawapolice.ca/datasets/25a533e5e7c945faadf37765639657a5_16/about
= pd.read_csv("CrimeMap_YE_-5915901148112462236.csv") crime
<string>:4: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
crime.head()
OBJECTID GO_Number ... x y
0 1 GO 2021250904 ... 371119.9183 5.026146e+06
1 2 GO 2021814441 ... 364615.1637 5.025050e+06
2 3 GO 2018804720 ... 371101.3274 5.026190e+06
3 4 GO 2018192790 ... 365573.8943 5.025345e+06
4 5 GO 20222711 ... 371119.9183 5.026146e+06
[5 rows x 28 columns]
#https://data.ottawapolice.ca/datasets/25a533e5e7c945faadf37765639657a5_16/about
<- "CrimeMap_YE_-5915901148112462236.csv"
csv_link
<- read_csv(csv_link) crime
Rows: 165895 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): GO_Number, Offence_Summary, Offence_Category, Reported_Date, Occu...
dbl (11): OBJECTID, Reported_Year, Reported_Hour, Occurred_Year, Occurred_H...
date (2): Occurred_Date_str, Reported_Date_str
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- crime|> rename_all(make.names)
crime
head(crime)
# A tibble: 6 × 28
OBJECTID GO_Number Offence_Summary Offence_Category Reported_Date
<dbl> <chr> <chr> <chr> <chr>
1 1 GO 2021250904 Crime Against Proper… Break and Enter 10/1/2021 5:…
2 2 GO 2021814441 Crime Against Proper… Theft $5000 and… 9/27/2021 5:…
3 3 GO 2018804720 Crime Against Proper… Mischief 4/23/2018 5:…
4 4 GO 2018192790 Crime Against Proper… Break and Enter 8/5/2018 5:0…
5 5 GO 20222711 Crime Against Proper… Theft $5000 and… 1/4/2022 5:0…
6 6 GO 2022345612 Crime Against Proper… Theft $5000 and… 12/31/2022 5…
# ℹ 23 more variables: Reported_Year <dbl>, Reported_Hour <dbl>,
# Occurred_Date <chr>, Occurred_Year <dbl>, Occurred_Hour <dbl>,
# Time_of_Day <chr>, Week_Day <chr>, Day_of_Week <dbl>, Intersection <chr>,
# Neighbourhood <chr>, Sector <chr>, Division <chr>, Ward <chr>,
# Councillor <chr>, Reported.Month <dbl>, Occurred_Date_str <date>,
# Occurred_Year_str <dbl>, Occurred_Hour_str <chr>, Reported_Date_str <date>,
# Reported_Year_str <dbl>, Reported_Hour_str <chr>, x <dbl>, y <dbl>
Note that this dataset contains over 165,000 records! That is too many to plot on a single map, so it’s worth filtering for particular crimes and dates before plotting.
"Offence_Category"].unique() crime[
array(['Break and Enter', 'Theft $5000 and Under', 'Mischief', 'Robbery',
'Assaults', 'Theft Over $5000',
'Indecent or Harassing Communications', 'Uttering Threats',
'Theft of Motor Vehicle', 'Criminal Harassment', 'Arson',
'Attempted Murder', 'Homicide'], dtype=object)
"Reported_Year"].unique() crime[
array([2021, 2018, 2022, 2020, 2019, 2023])
$Offence_Category |> unique() crime
[1] "Break and Enter"
[2] "Theft $5000 and Under"
[3] "Mischief"
[4] "Robbery"
[5] "Assaults"
[6] "Theft Over $5000"
[7] "Indecent or Harassing Communications"
[8] "Uttering Threats"
[9] "Theft of Motor Vehicle"
[10] "Criminal Harassment"
[11] "Arson"
[12] "Attempted Murder"
[13] "Homicide"
$Reported_Year |> unique() crime
[1] 2021 2018 2022 2020 2019 2023
= gp.GeoDataFrame(
crime_geo =gp.points_from_xy(crime["x"], crime["y"]), crs="epsg:2951")
crime, geometry
"Offence_Category"] == "Break and Enter") &
crime_geo[(crime_geo["Reported_Year"] == 2023)].explore() (crime_geo[
<- st_as_sf(crime |> subset(Offence_Category == "Break and Enter" & Reported_Year == 2023),
bande2023latlong coords = c("x", "y"),
crs = 2951) |>
st_transform(crs = 4386 ) |>
st_coordinates()
leaflet() |>
addTiles() |>
addMarkers(data = bande2023latlong)