Case Study: Understanding how Canada’s economy might be impacted by climate change.

Basic libraries and R version

version
##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          2.2                         
## year           2022                        
## month          10                          
## day            31                          
## svn rev        83211                       
## language       R                           
## version.string R version 4.2.2 (2022-10-31)
## nickname       Innocent and Trusting
library(tidyverse)   # for its nice flow.
library(lubridate)   # for handling dates
library(sf)          # for geo manipulations
library(geojsonsf)   # for loading geojson files
library(htmltools)   # for leaflet support
library(leaflet)     # for fancy plots

There are 3 datafiles.

Once unzipped the following code will load the datasets.

Set up locations

# set up data locations:
location = paste0(getwd(),"/")

locationprod = paste0(location,"Productivity per NAICS within region/")
locationgeog = paste0(location,"geojson_files/")
locationclim = paste0(location,"climate_data/")

Define the provinces

Set up Canadian province names and 2 letter abbreviations

province_codings = matrix(c("Newfoundland and Labrador",    "NL",   
                            "Prince Edward Island", "PE",   
                            "Nova Scotia",  "NS",   
                            "New Brunswick",    "NB",   
                            "Quebec",   "QC",   
                            "Ontario",  "ON",   
                            "Manitoba", "MB",   
                            "Saskatchewan", "SK",   
                            "British Columbia", "BC",   
                            "Alberta",  "AB",   
                            "Yukon",        "YT",   
                            "Northwest Territories",    "NT",   
                            "Nunavut",  "NU"),
                          ncol=2,byrow=TRUE, 
                          dimnames = list(NULL, c("province", "abv"))) %>%
  as_tibble(province_codings)

Productivity data: {by industry, month, geography}

This comes from combining National Monthly GDP data with Provincial Annual GDP, measures of effort (provincial monthly total hours worked), and the census counts of people working in different industries in different geographic locations.

All geographies are defined as Census Subdivisions with respect to the 2016 census GeoUIDs.

Load the data

Province = "Nova Scotia"
## or get the right spelling from the 2 digit abbreviation code:
# Province = province_codings %>% filter(abv == "NS") %>% pull(province)
## note that there is a national file, but it's bigger and slower to play with
prod_data = read_csv(paste0(locationprod,"4.c_production_in_CSD_in",Province,".csv"), show_col_types = FALSE)

prod_data %>% glimpse
## Rows: 28,800
## Columns: 21
## $ Date                                                                                                           <date> …
## $ provincename                                                                                                   <chr> …
## $ production_in_division_X22.Utilities                                                                           <dbl> …
## $ production_in_division_X23.Construction                                                                        <dbl> …
## $ production_in_division_X31.33.Manufacturing                                                                    <dbl> …
## $ production_in_division_X48.49.Transportation.and.warehousing                                                   <dbl> …
## $ production_in_division_X61.Educational.services                                                                <dbl> …
## $ production_in_division_X62.Health.care.and.social.assistance                                                   <dbl> …
## $ production_in_division_X72.Accommodation.and.food.services                                                     <dbl> …
## $ production_in_division_X81.Other.services..except.public.administration.                                       <dbl> …
## $ production_in_division_X91.Public.administration                                                               <dbl> …
## $ production_in_division_X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.oil.and.gas.extraction <dbl> …
## $ production_in_division_X41.Wholesale.trade.44.45.Retail.trade                                                  <dbl> …
## $ production_in_division_X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing                         <dbl> …
## $ production_in_division_X54.Professional..scientific.and.technical.services.55.56                               <dbl> …
## $ production_in_division_X51.Information.culture.and.recreation.71                                               <dbl> …
## $ Population                                                                                                     <dbl> …
## $ GeoUID                                                                                                         <dbl> …
## $ census_year_ref                                                                                                <chr> …
## $ Dominant_NAICS                                                                                                 <chr> …
## $ colourval                                                                                                      <chr> …

Some info about variables

Dates are monthly for the duration. There is one file per province, though these are also compiled into a single national file as well. The territories are missing because of data limitations.

#date range
prod_data %>% pull(Date)%>% range
## [1] "1997-01-01" "2021-12-01"
# provinces
prod_data %>% pull(provincename)%>% unique
## [1] "Nova Scotia"

The StatCan Geography is contained in GeoUID. This is how many there are in this particular province.

prod_data %>% pull(GeoUID)%>% unique %>% length
## [1] 96

The census information is tied to a single census, defined in the column census_year_ref. This is mainly used as version control for the data. Each GeoUID has it’s own population count, tied to the referenced census or eventual interpolation thereof.

prod_data %>% pull(census_year_ref)%>% unique()
## [1] "CA16"

Industries are classified according to the North American Industrial Classification System (NAICS). The dominant industry for the geography is defined in Dominant_NAICS. There is a single value per GeoUID. There is a hex colour code associated with industries and held in the column colourval. The colour codes are optional and can be discarded.

prod_data %>% select(Dominant_NAICS, colourval)%>% unique()
## # A tibble: 11 × 2
##    Dominant_NAICS                                                      colourval
##    <chr>                                                               <chr>    
##  1 X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.o… #8DA0CB  
##  2 X31.33.Manufacturing                                                #4DAF4A  
##  3 X91.Public.administration                                           #FC8D62  
##  4 X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing     #A6D854  
##  5 X62.Health.care.and.social.assistance                               #FFFF33  
##  6 X41.Wholesale.trade.44.45.Retail.trade                              #E78AC3  
##  7 <NA>                                                                <NA>     
##  8 X51.Information.culture.and.recreation.71                           #E5C494  
##  9 X61.Educational.services                                            #FF7F00  
## 10 X23.Construction                                                    #377EB8  
## 11 X22.Utilities                                                       #E41A1C

The variables relating to industry all contain 2 digit numbers. Those are the NAICS values that are encompassed. In many cases categories were combined, so multiple numbers are in one variable name.
Despite the production variable name, they actually approximate the productivity = output / effort for a month within a census subdivision.

prod_data %>% colnames %>% grep(pattern = "production_in_division", value = TRUE)
##  [1] "production_in_division_X22.Utilities"                                                                          
##  [2] "production_in_division_X23.Construction"                                                                       
##  [3] "production_in_division_X31.33.Manufacturing"                                                                   
##  [4] "production_in_division_X48.49.Transportation.and.warehousing"                                                  
##  [5] "production_in_division_X61.Educational.services"                                                               
##  [6] "production_in_division_X62.Health.care.and.social.assistance"                                                  
##  [7] "production_in_division_X72.Accommodation.and.food.services"                                                    
##  [8] "production_in_division_X81.Other.services..except.public.administration."                                      
##  [9] "production_in_division_X91.Public.administration"                                                              
## [10] "production_in_division_X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.oil.and.gas.extraction"
## [11] "production_in_division_X41.Wholesale.trade.44.45.Retail.trade"                                                 
## [12] "production_in_division_X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing"                        
## [13] "production_in_division_X54.Professional..scientific.and.technical.services.55.56"                              
## [14] "production_in_division_X51.Information.culture.and.recreation.71"

Geographies

The geography files are held separately from the data so as to avoid replicating the shapes across the months of the data. The GeoUID links the geographies with their productivity.

# need to use the abbreviation for the geography file name rather than the full province name... Sorry about that.
# This will auto extract it though from the above
ProvinceCode = province_codings %>% filter(province == Province) %>% pull(abv)

geometryfilename = paste0(locationgeog, '1.a_census_data_',ProvinceCode,'_CSD_geometry_only.geojson')
geom =  geojson_sf(geometryfilename)

geom %>% glimpse
## Rows: 96
## Columns: 4
## $ GeoUID       <chr> "1201001", "1201004", "1201006", "1201008", "1201009", "1…
## $ Region.Name  <chr> "Barrington (MD)", "Clark's Harbour (T)", "Shelburne (MD)…
## $ provincename <chr> "Nova Scotia", "Nova Scotia", "Nova Scotia", "Nova Scotia…
## $ geometry     <MULTIPOLYGON [°]> MULTIPOLYGON (((-65.4581 43..., MULTIPOLYGON…

Weather Data:

The weather data is directly taken from Environment and Climate Change Canada.

The climate variables are compiled monthly at weather stations. The variables include the geographic location, time variables, and a wide range of climate variables. Not all variables are measured at each weather station. The units are included in the weather variable names. Mean values (such as Mean Max Temp (°C)) are taken across all calendar dates. Extreme temperature values are taken across all the whole month (such as Extr Min Temp (°C) = the coldest minimum temperature recorded for the month). The full set of variable definitions are also available

weather_data_full = read_csv(paste0(locationclim,"weather_Station_data.csv"),show_col_types = FALSE)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
weather_data_full %>% glimpse
## Rows: 67,710
## Columns: 29
## $ `Longitude (x)`              <dbl> -61.68, -61.68, -61.68, -61.68, -61.68, -…
## $ `Latitude (y)`               <dbl> 56.55, 56.55, 56.55, 56.55, 56.55, 56.55,…
## $ `Station Name`               <chr> "NAIN", "NAIN", "NAIN", "NAIN", "NAIN", "…
## $ `Climate ID`                 <chr> "8502799", "8502799", "8502799", "8502799…
## $ `Date/Time`                  <chr> "2004-11", "2004-12", "2005-01", "2005-02…
## $ Year                         <dbl> 2004, 2004, 2005, 2005, 2005, 2005, 2005,…
## $ Month                        <chr> "11", "12", "01", "02", "03", "04", "05",…
## $ `Mean Max Temp (°C)`         <dbl> -0.2, -9.7, -18.4, -8.9, -5.2, 2.3, 7.8, …
## $ `Mean Max Temp Flag`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Mean Min Temp (°C)`         <dbl> -6.7, -16.9, -25.4, -19.8, -14.5, -5.7, -…
## $ `Mean Min Temp Flag`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Mean Temp (°C)`             <dbl> -3.4, -13.3, -21.9, -14.4, -9.9, -1.7, 3.…
## $ `Mean Temp Flag`             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Extr Max Temp (°C)`         <dbl> 5.1, -1.0, -9.7, 3.8, 4.0, 8.5, 21.3, 24.…
## $ `Extr Max Temp Flag`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Extr Min Temp (°C)`         <dbl> -13.3, -27.1, -30.8, -29.2, -30.4, -14.9,…
## $ `Extr Min Temp Flag`         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Rain (mm)`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Rain Flag`            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Snow (cm)`            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Snow Flag`            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Precip (mm)`          <dbl> 80.7, 88.0, 42.5, 119.8, 145.8, 44.2, 61.…
## $ `Total Precip Flag`          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Snow Grnd Last Day (cm)`    <dbl> 26, 24, 18, 51, 45, 26, 0, 0, 0, 0, 0, NA…
## $ `Snow Grnd Last Day Flag`    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Dir of Max Gust (10's deg)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Dir of Max Gust Flag`       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Spd of Max Gust (km/h)`     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Spd of Max Gust Flag`       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…