Ottawa Police Open Data

Ottawa
Open Street Maps
csv files
Software:Python
Software:R
Author

Dean Jayatilleke & Dave Campbell

Published

July 10, 2024

Data Provider

Ottawa Police’s Community Safety Data Portal is an open-access data portal designed to provide transparent and accessible information about police activity to the public. Data is available in tabular and geographic forms, and can be freely downloaded and interacted with. All data from this portal is subject to Open Data Licence Version 2.0. Information released under this licence may be copied, modified, published, adapted, or otherwise used for any lawful purpose, as long as a data attibution from the information provider is included. When using information from multiple providers, the following attribution must be used: Contains information licensed under the Open Government Licence – Ottawa Police Service.

Plotting Geographical Data Points on Open Street Maps


import pandas as pd 
import geopandas as gp # Useful for manipulating geographic data in dataframes
import mapclassify # Geopandas uses these to produce plots on Open Street Maps 
import folium # Geopandas uses these to produce plots on Open Street Maps 
library(leaflet) 
library(sf)
library(tidyverse)

A word about retrieving Ottawa Police Data.

Ottawa Police datasets are available to download as CSVs, however the download URL is generated on demand and expires shortly, within a few days or sometimes even hours. Also, the availability of dataset csv download is somewhat tenuous, and you may find that particular dataset is unavailable for download at a given time. For this reason, it is recommended that the dataset of interest be downloaded into a working directory. It can then be read using read_csv directly from that file. Errors may occur if a download url is passed directly to read_csv.

Shootings

#https://data.ottawapolice.ca/datasets/aeb09d77912246139014516c8eeefcd9_0/about

shootings = pd.read_csv("Shootings_Open_Data_3086844035331583488.csv")

shootings.head()
   OBJECTID    ID  ...            x             y
0         1  5825  ...  370825.7260  5.025379e+06
1         2  8759  ...  370838.8789  5.026881e+06
2         3  7144  ...  371033.7260  5.026355e+06
3         4  9156  ...  371101.1351  5.026191e+06
4         5  7144  ...  371033.7260  5.026355e+06

[5 rows x 20 columns]

The location of each reported shooting is given in the EPSG:2951 coordinate system, represented in the X and Y columns. In order to plot these points on Open Street Maps, we pass the pandas dataframe to a Geopandas dataframe, which is better suited for geographical data. The Geopandas explore function usings libraries folium and mapclassify to plot on open street maps.

geo_df = gp.GeoDataFrame(
    shootings, geometry=gp.points_from_xy(shootings["x"], shootings["y"]), crs="epsg:2951")

geo_df.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
#https://data.ottawapolice.ca/datasets/aeb09d77912246139014516c8eeefcd9_0/about

csv_link <- "Shootings_Open_Data_3086844035331583488.csv"

shootings <- read_csv(csv_link)
Rows: 413 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (9): Reported Date, Occurred Date, Time of Day, Weekday, Neighbourhood,...
dbl (11): OBJECTID, ID, Reported Hour, Reported Year, Occurred Hour, Occurre...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(shootings)
# A tibble: 6 × 20
  OBJECTID    ID `Reported Date` `Reported Hour` `Reported Year` `Occurred Date`
     <dbl> <dbl> <chr>                     <dbl>           <dbl> <chr>          
1        1  5825 7/5/2022 5:00:…            1200            2022 7/5/2022 5:00:…
2        2  8759 4/4/2022 5:00:…            1600            2022 4/4/2022 5:00:…
3        3  7144 6/6/2021 5:00:…             200            2021 6/6/2021 5:00:…
4        4  9156 5/22/2018 5:00…             100            2018 5/21/2018 5:00…
5        5  7144 1/16/2018 5:00…            1600            2018 1/16/2018 5:00…
6        6  7144 6/23/2021 5:00…             200            2021 6/23/2021 5:00…
# ℹ 14 more variables: `Occurred Hour` <dbl>, `Occurred Year` <dbl>,
#   `Time of Day` <chr>, Weekday <chr>, `Day of Week` <dbl>,
#   Neighbourhood <chr>, Sector <dbl>, Division <chr>, Ward <chr>,
#   Councillor <chr>, `Census Tract` <dbl>, `Level of Injury` <chr>, x <dbl>,
#   y <dbl>

The location of each reported shooting is given in the EPSG:2951 coordinate system, represented in the X and Y columns. In order to plot these points on Open Street Maps using the [Leaflet] (https://rstudio.github.io/leaflet/) library, we will need to convert to Latitude and Longitude, also known as the EPSG:4386 coordinate system. R’s Simple Features (sf) library can be used to manipulate spatial data. [Simple Features] (https://cran.r-project.org/web/packages/sf/index.html).

shootingslatlong <- st_as_sf(shootings, coords = c("x", "y"), crs = 2951) |> # Tell R that the X and Y columns represent location coordinates in the EPSG:2951 system
                    st_transform(crs = 4386 ) |> # Transform the X and Y location data into the EPSG:4386 system, aka latitude and longitude.
                    st_coordinates() # Extract the latitude and longitude data into a new dataframe that we will plot
leaflet() |>
  addTiles() |>
  addMarkers(data = shootingslatlong)

Bike Thefts

#https://data.ottawapolice.ca/datasets/eff8a6410ec74136b5f611017e244a4e_0/about

bike_thefts = pd.read_csv("Bike_Theft_Open_Data_3884822422354997826.csv")

bike_thefts.head()
   OBJECTID     ID  Year  ... Occurred Hour            x             y
0         1   2959  2018  ...        1900.0  368056.8695  5.031032e+06
1         2   5058  2018  ...         600.0  367201.7041  5.030721e+06
2         3  10999  2018  ...        1700.0  368325.7049  5.032417e+06
3         4  11000  2018  ...        1700.0  368325.7049  5.032417e+06
4         5  10998  2018  ...         800.0  368325.7049  5.032417e+06

[5 rows x 28 columns]
geo_df = gp.GeoDataFrame(
    bike_thefts, geometry=gp.points_from_xy(bike_thefts["x"], bike_thefts["y"]), crs="epsg:2951")

geo_df[geo_df["Bicycle Value"] > 2000].explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
#https://data.ottawapolice.ca/datasets/eff8a6410ec74136b5f611017e244a4e_0/about

csv_link <- "Bike_Theft_Open_Data_3884822422354997826.csv"

bike_thefts <- read_csv(csv_link)
Rows: 12130 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Reported Date, Occurred Date, Offence Category, Bicycle Style, Bic...
dbl (10): OBJECTID, ID, Year, Bicycle Value, Bicycle Speed, Census Tract, Re...
lgl  (2): Day of Week, Bicycle Frame

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bike_thefts <- bike_thefts |> rename_all(make.names)

head(bike_thefts)
# A tibble: 6 × 28
  OBJECTID    ID  Year Reported.Date  Occurred.Date Day.of.Week Offence.Category
     <dbl> <dbl> <dbl> <chr>          <chr>         <lgl>       <chr>           
1        1  2959  2018 1/5/2018 5:00… 1/4/2018 5:0… NA          Theft< Bicycle  
2        2  5058  2018 1/8/2018 5:00… 1/8/2018 5:0… NA          Prop Found/ Rec 
3        3 10999  2018 1/22/2018 5:0… 1/22/2018 5:… NA          Theft< Bicycle  
4        4 11000  2018 1/22/2018 5:0… 1/22/2018 5:… NA          Theft< Bicycle  
5        5 10998  2018 1/23/2018 5:0… 1/19/2018 5:… NA          Theft< Bicycle  
6        6  3248  2018 1/23/2018 5:0… 1/21/2018 5:… NA          Theft< Bicycle  
# ℹ 21 more variables: Bicycle.Style <chr>, Bicycle.Value <dbl>,
#   Bicycle.Make <chr>, Bicycle.Model <chr>, Bicycle.Type <chr>,
#   Bicycle.Frame <lgl>, Bicycle.Colour <chr>, Bicycle.Speed <dbl>,
#   Neighbourhood <chr>, Sector <chr>, Division <chr>, Census.Tract <dbl>,
#   Status <chr>, Intersection <chr>, Time.of.Day <chr>, Ward <chr>,
#   Councillor <chr>, Reported.Hour <dbl>, Occurred.Hour <dbl>, x <dbl>,
#   y <dbl>
expensive_bike_theftslatlong <- st_as_sf(bike_thefts |> subset(Bicycle.Value > 2000), coords = c("x", "y"), crs = 2951) |> 
                                st_transform(crs = 4386 ) |> 
                                st_coordinates() 
leaflet() |>
  addTiles() |>
  addMarkers(data = expensive_bike_theftslatlong)

Motor Vehicle Thefts

#https://data.ottawapolice.ca/datasets/e5453a203e2a4baeac32ac4e70ce852c_0/about

vehicle_theft = pd.read_csv("Auto_Theft_Open_Data_2158590371504723284.csv")

vehicle_theft.head()
   OBJECTID  Vehicle Year  ...            x             y
0         1           NaN  ...  367986.8978  5.031477e+06
1         2        2017.0  ...  381704.9517  5.032668e+06
2         3           NaN  ...  350914.7854  5.013093e+06
3         4        2004.0  ...  365187.1427  5.014770e+06
4         5        2012.0  ...  360961.8697  5.020792e+06

[5 rows x 24 columns]
vehicle_theft["Vehicle Make"].unique()
array([nan, 'JEEP', 'INFI', 'DODG', 'TOYT', 'HYUN', 'HOND', 'SUBA', 'GMC',
       'FORD', 'MITS', 'MINI', 'LEXS', 'CHEV', 'NISS', 'MERZ', 'AUDI',
       'KUBO', 'KIA', 'MAZD', 'STOU', 'BMW', 'PONT', 'VOLK', 'CATE',
       'ISU', 'BUIC', 'MALI', 'INTI', 'VOLV', 'CADI', 'SUZI', 'DEER',
       'WILD', 'BOCT', 'ACUR', 'KAWK', 'CHRY', 'HSQV', 'CASE', 'ASTR',
       'CUB', 'GM', 'FRHT', 'RAZR', 'JDJ', 'AUTU', 'HMDE', 'YAMA', 'HINO',
       'PONI', 'BOMB', 'STEA', 'LNDR', 'JAGU', 'INTL', 'KRT', 'NITR',
       'TRL', 'RAM', 'STRN', 'POLS', 'SATR', 'TOWR', 'KOMA', 'CAME',
       'SCIO', 'MISK', 'LINC', 'VECO', 'CSHO', 'OLDS', 'SUNB', 'MANA',
       'GIAN', 'PORS', 'CUBE', 'PLYM', 'UHAU', 'FORT', 'INTE', 'GIO',
       'CANA', 'MERC', 'NEWH', 'TORO', 'PIAG', 'JAWA', 'FIAT', 'TREX',
       'SNOW', 'EMMO', 'POLA', 'SCOT', 'NOTH', 'ARCT', 'KUGO', 'CARG',
       'GLEN', 'MOSC', 'TOYO', 'KYMC', 'EBIK', 'VECT', 'NINE', 'SMAR',
       'ARTC', 'MONA', 'EMO', 'GOTR', 'AURI', 'FREE', 'JOHN', 'LOAD',
       'SMRT', 'LUND', 'DAYM', 'TAO', 'BIRD', '2010', 'GATO', 'DERA',
       'PEDA', 'MNNI', 'TESL', 'CFMO', 'TRA', 'MISS', 'WEBE', 'HOME',
       'META', 'AMGE', 'SAA', 'GEN', 'STIR', 'HUMM', 'EZHL', 'HD', 'NCM',
       'SEGW', 'CROW', 'BRI', 'TRIT', 'PEGA', 'ZERO', 'CAT', 'BEAR',
       'BIKT', 'STER', 'NIU', 'MAXI', 'CRV', 'POLE', 'DODD', 'LAND',
       'MACK', 'SHOR', 'GEO', 'MOTO', 'TOY', 'MOFF', 'SGAC', 'PEAC',
       'NEW', 'NEO', 'TERE', 'HOMD', 'NEXH', 'EASY', 'KTRL', 'WACK',
       'THOR', 'PTRB', 'CAMI', 'ETRI', 'AMGN', 'KARA', 'KTR'],
      dtype=object)
vehicle_theft_geo = gp.GeoDataFrame(
    vehicle_theft, geometry=gp.points_from_xy(vehicle_theft["x"], vehicle_theft["y"]), crs="epsg:2951")

vehicle_theft_geo[vehicle_theft_geo["Vehicle Make"] == "VOLK"].explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
#https://data.ottawapolice.ca/datasets/e5453a203e2a4baeac32ac4e70ce852c_0/about

csv_link <- "Auto_Theft_Open_Data_2158590371504723284.csv"

vehicle_theft <- read_csv(csv_link)
Rows: 7865 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Vehicle Make, Vehicle Model, Vehicle Style, Vehicle Colour, Weekda...
dbl  (9): OBJECTID, Vehicle Year, Vehicle Value, Year, Census Tract, Reporte...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vehicle_theft <- vehicle_theft |> rename_all(make.names)

head(vehicle_theft)
# A tibble: 6 × 24
  OBJECTID Vehicle.Year Vehicle.Make Vehicle.Model Vehicle.Style Vehicle.Colour
     <dbl>        <dbl> <chr>        <chr>         <chr>         <chr>         
1        1           NA <NA>         <NA>          <NA>          BLK           
2        2         2017 JEEP         WRG           CARRY         BLK           
3        3           NA <NA>         <NA>          MOTO          BLK           
4        4         2004 INFI         G35           4DR           BLK           
5        5         2012 DODG         R15           PICKUP        RED           
6        6         2000 TOYT         CAM           4DR           BRN           
# ℹ 18 more variables: Vehicle.Value <dbl>, Weekday <chr>, Recovered <chr>,
#   Neighbourhood <chr>, Ward <chr>, Sector <chr>, Reported.Date <chr>,
#   Occurred.Date <chr>, Year <dbl>, Intersection <chr>, Division <chr>,
#   Census.Tract <dbl>, Time.of.Day <chr>, Councillor <chr>,
#   Reported.Hour <dbl>, Occurred.Hour <dbl>, x <dbl>, y <dbl>
stolen_volk_latlong <- st_as_sf(vehicle_theft |> subset(Vehicle.Make == "VOLK"), coords = c("x", "y"), crs = 2951) |> 
                       st_transform(crs = 4386 ) |> 
                       st_coordinates() 

leaflet() |>
  addTiles() |>
  addMarkers(data = stolen_volk_latlong)

Historical Crime Data 2013 - 2022

#https://data.ottawapolice.ca/datasets/25a533e5e7c945faadf37765639657a5_16/about

crime = pd.read_csv("CrimeMap_YE_-5915901148112462236.csv")
<string>:4: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
crime.head()
   OBJECTID         GO_Number  ...            x             y
0         1  GO    2021250904  ...  371119.9183  5.026146e+06
1         2  GO    2021814441  ...  364615.1637  5.025050e+06
2         3  GO    2018804720  ...  371101.3274  5.026190e+06
3         4  GO    2018192790  ...  365573.8943  5.025345e+06
4         5    GO    20222711  ...  371119.9183  5.026146e+06

[5 rows x 28 columns]
#https://data.ottawapolice.ca/datasets/25a533e5e7c945faadf37765639657a5_16/about

csv_link <- "CrimeMap_YE_-5915901148112462236.csv"

crime <- read_csv(csv_link)
Rows: 165895 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (15): GO_Number, Offence_Summary, Offence_Category, Reported_Date, Occu...
dbl  (11): OBJECTID, Reported_Year, Reported_Hour, Occurred_Year, Occurred_H...
date  (2): Occurred_Date_str, Reported_Date_str

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crime <- crime|> rename_all(make.names)

head(crime)
# A tibble: 6 × 28
  OBJECTID GO_Number        Offence_Summary       Offence_Category Reported_Date
     <dbl> <chr>            <chr>                 <chr>            <chr>        
1        1 GO    2021250904 Crime Against Proper… Break and Enter  10/1/2021 5:…
2        2 GO    2021814441 Crime Against Proper… Theft $5000 and… 9/27/2021 5:…
3        3 GO    2018804720 Crime Against Proper… Mischief         4/23/2018 5:…
4        4 GO    2018192790 Crime Against Proper… Break and Enter  8/5/2018 5:0…
5        5 GO    20222711   Crime Against Proper… Theft $5000 and… 1/4/2022 5:0…
6        6 GO    2022345612 Crime Against Proper… Theft $5000 and… 12/31/2022 5…
# ℹ 23 more variables: Reported_Year <dbl>, Reported_Hour <dbl>,
#   Occurred_Date <chr>, Occurred_Year <dbl>, Occurred_Hour <dbl>,
#   Time_of_Day <chr>, Week_Day <chr>, Day_of_Week <dbl>, Intersection <chr>,
#   Neighbourhood <chr>, Sector <chr>, Division <chr>, Ward <chr>,
#   Councillor <chr>, Reported.Month <dbl>, Occurred_Date_str <date>,
#   Occurred_Year_str <dbl>, Occurred_Hour_str <chr>, Reported_Date_str <date>,
#   Reported_Year_str <dbl>, Reported_Hour_str <chr>, x <dbl>, y <dbl>

Note that this dataset contains over 165,000 records! That is too many to plot on a single map, so it’s worth filtering for particular crimes and dates before plotting.

crime["Offence_Category"].unique()
array(['Break and Enter', 'Theft $5000 and Under', 'Mischief', 'Robbery',
       'Assaults', 'Theft Over $5000',
       'Indecent or Harassing Communications', 'Uttering Threats',
       'Theft of Motor Vehicle', 'Criminal Harassment', 'Arson',
       'Attempted Murder', 'Homicide'], dtype=object)
crime["Reported_Year"].unique()
array([2021, 2018, 2022, 2020, 2019, 2023])
crime$Offence_Category |> unique()
 [1] "Break and Enter"                     
 [2] "Theft $5000 and Under"               
 [3] "Mischief"                            
 [4] "Robbery"                             
 [5] "Assaults"                            
 [6] "Theft Over $5000"                    
 [7] "Indecent or Harassing Communications"
 [8] "Uttering Threats"                    
 [9] "Theft of Motor Vehicle"              
[10] "Criminal Harassment"                 
[11] "Arson"                               
[12] "Attempted Murder"                    
[13] "Homicide"                            
crime$Reported_Year |> unique()
[1] 2021 2018 2022 2020 2019 2023
crime_geo = gp.GeoDataFrame(
    crime, geometry=gp.points_from_xy(crime["x"], crime["y"]), crs="epsg:2951")

crime_geo[(crime_geo["Offence_Category"] == "Break and Enter") &
          (crime_geo["Reported_Year"] == 2023)].explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
bande2023latlong <- st_as_sf(crime |> subset(Offence_Category == "Break and Enter" & Reported_Year == 2023), 
                    coords = c("x", "y"),
                    crs = 2951) |> 
                    st_transform(crs = 4386 ) |> 
                    st_coordinates() 

leaflet() |>
  addTiles() |>
  addMarkers(data = bande2023latlong)