version
## _
## platform x86_64-apple-darwin17.0
## arch x86_64
## os darwin17.0
## system x86_64, darwin17.0
## status
## major 4
## minor 2.2
## year 2022
## month 10
## day 31
## svn rev 83211
## language R
## version.string R version 4.2.2 (2022-10-31)
## nickname Innocent and Trusting
library(tidyverse) # for its nice flow.
library(lubridate) # for handling dates
library(sf) # for geo manipulations
library(geojsonsf) # for loading geojson files
library(htmltools) # for leaflet support
library(leaflet) # for fancy plots
There are 3 datafiles.
Once unzipped the following code will load the datasets.
# set up data locations:
location = paste0(getwd(),"/")
locationprod = paste0(location,"Productivity per NAICS within region/")
locationgeog = paste0(location,"geojson_files/")
locationclim = paste0(location,"climate_data/")
Set up Canadian province names and 2 letter abbreviations
province_codings = matrix(c("Newfoundland and Labrador", "NL",
"Prince Edward Island", "PE",
"Nova Scotia", "NS",
"New Brunswick", "NB",
"Quebec", "QC",
"Ontario", "ON",
"Manitoba", "MB",
"Saskatchewan", "SK",
"British Columbia", "BC",
"Alberta", "AB",
"Yukon", "YT",
"Northwest Territories", "NT",
"Nunavut", "NU"),
ncol=2,byrow=TRUE,
dimnames = list(NULL, c("province", "abv"))) %>%
as_tibble(province_codings)
This comes from combining National Monthly GDP data with Provincial Annual GDP, measures of effort (provincial monthly total hours worked), and the census counts of people working in different industries in different geographic locations.
All geographies are defined as Census Subdivisions with respect to the 2016 census GeoUIDs.
Province = "Nova Scotia"
## or get the right spelling from the 2 digit abbreviation code:
# Province = province_codings %>% filter(abv == "NS") %>% pull(province)
## note that there is a national file, but it's bigger and slower to play with
prod_data = read_csv(paste0(locationprod,"4.c_production_in_CSD_in",Province,".csv"), show_col_types = FALSE)
prod_data %>% glimpse
## Rows: 28,800
## Columns: 21
## $ Date <date> …
## $ provincename <chr> …
## $ production_in_division_X22.Utilities <dbl> …
## $ production_in_division_X23.Construction <dbl> …
## $ production_in_division_X31.33.Manufacturing <dbl> …
## $ production_in_division_X48.49.Transportation.and.warehousing <dbl> …
## $ production_in_division_X61.Educational.services <dbl> …
## $ production_in_division_X62.Health.care.and.social.assistance <dbl> …
## $ production_in_division_X72.Accommodation.and.food.services <dbl> …
## $ production_in_division_X81.Other.services..except.public.administration. <dbl> …
## $ production_in_division_X91.Public.administration <dbl> …
## $ production_in_division_X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.oil.and.gas.extraction <dbl> …
## $ production_in_division_X41.Wholesale.trade.44.45.Retail.trade <dbl> …
## $ production_in_division_X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing <dbl> …
## $ production_in_division_X54.Professional..scientific.and.technical.services.55.56 <dbl> …
## $ production_in_division_X51.Information.culture.and.recreation.71 <dbl> …
## $ Population <dbl> …
## $ GeoUID <dbl> …
## $ census_year_ref <chr> …
## $ Dominant_NAICS <chr> …
## $ colourval <chr> …
Dates are monthly for the duration. There is one file per province, though these are also compiled into a single national file as well. The territories are missing because of data limitations.
#date range
prod_data %>% pull(Date)%>% range
## [1] "1997-01-01" "2021-12-01"
# provinces
prod_data %>% pull(provincename)%>% unique
## [1] "Nova Scotia"
The StatCan Geography is contained in GeoUID. This is how many there are in this particular province.
prod_data %>% pull(GeoUID)%>% unique %>% length
## [1] 96
The census information is tied to a single census, defined in the column census_year_ref. This is mainly used as version control for the data. Each GeoUID has it’s own population count, tied to the referenced census or eventual interpolation thereof.
prod_data %>% pull(census_year_ref)%>% unique()
## [1] "CA16"
Industries are classified according to the North American Industrial Classification System (NAICS). The dominant industry for the geography is defined in Dominant_NAICS. There is a single value per GeoUID. There is a hex colour code associated with industries and held in the column colourval. The colour codes are optional and can be discarded.
prod_data %>% select(Dominant_NAICS, colourval)%>% unique()
## # A tibble: 11 × 2
## Dominant_NAICS colourval
## <chr> <chr>
## 1 X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.o… #8DA0CB
## 2 X31.33.Manufacturing #4DAF4A
## 3 X91.Public.administration #FC8D62
## 4 X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing #A6D854
## 5 X62.Health.care.and.social.assistance #FFFF33
## 6 X41.Wholesale.trade.44.45.Retail.trade #E78AC3
## 7 <NA> <NA>
## 8 X51.Information.culture.and.recreation.71 #E5C494
## 9 X61.Educational.services #FF7F00
## 10 X23.Construction #377EB8
## 11 X22.Utilities #E41A1C
The variables relating to industry all contain 2 digit numbers. Those
are the NAICS values that are encompassed. In many cases categories were
combined, so multiple numbers are in one variable name.
Despite the production variable name, they actually
approximate the productivity = output
/ effort for a month within a census subdivision.
prod_data %>% colnames %>% grep(pattern = "production_in_division", value = TRUE)
## [1] "production_in_division_X22.Utilities"
## [2] "production_in_division_X23.Construction"
## [3] "production_in_division_X31.33.Manufacturing"
## [4] "production_in_division_X48.49.Transportation.and.warehousing"
## [5] "production_in_division_X61.Educational.services"
## [6] "production_in_division_X62.Health.care.and.social.assistance"
## [7] "production_in_division_X72.Accommodation.and.food.services"
## [8] "production_in_division_X81.Other.services..except.public.administration."
## [9] "production_in_division_X91.Public.administration"
## [10] "production_in_division_X11.Agriculture.forestry.fishing.hunting.21.Mining.quarrying.and.oil.and.gas.extraction"
## [11] "production_in_division_X41.Wholesale.trade.44.45.Retail.trade"
## [12] "production_in_division_X52.Finance.and.insurance.53.Real.estate.and.rental.and.leasing"
## [13] "production_in_division_X54.Professional..scientific.and.technical.services.55.56"
## [14] "production_in_division_X51.Information.culture.and.recreation.71"
The geography files are held separately from the data so as to avoid replicating the shapes across the months of the data. The GeoUID links the geographies with their productivity.
# need to use the abbreviation for the geography file name rather than the full province name... Sorry about that.
# This will auto extract it though from the above
ProvinceCode = province_codings %>% filter(province == Province) %>% pull(abv)
geometryfilename = paste0(locationgeog, '1.a_census_data_',ProvinceCode,'_CSD_geometry_only.geojson')
geom = geojson_sf(geometryfilename)
geom %>% glimpse
## Rows: 96
## Columns: 4
## $ GeoUID <chr> "1201001", "1201004", "1201006", "1201008", "1201009", "1…
## $ Region.Name <chr> "Barrington (MD)", "Clark's Harbour (T)", "Shelburne (MD)…
## $ provincename <chr> "Nova Scotia", "Nova Scotia", "Nova Scotia", "Nova Scotia…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-65.4581 43..., MULTIPOLYGON…
The weather data is directly taken from Environment and Climate Change Canada.
The climate variables are compiled monthly at weather stations. The
variables include the geographic location, time variables, and a wide
range of climate variables. Not all variables are measured at each
weather station. The units are included in the weather variable names.
Mean values (such as Mean Max Temp (°C)) are taken
across all calendar dates. Extreme temperature values are taken across
all the whole month (such as
Extr Min Temp (°C)
= the coldest minimum
temperature recorded for the month). The full set of variable
definitions are also
available
weather_data_full = read_csv(paste0(locationclim,"weather_Station_data.csv"),show_col_types = FALSE)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
weather_data_full %>% glimpse
## Rows: 67,710
## Columns: 29
## $ `Longitude (x)` <dbl> -61.68, -61.68, -61.68, -61.68, -61.68, -…
## $ `Latitude (y)` <dbl> 56.55, 56.55, 56.55, 56.55, 56.55, 56.55,…
## $ `Station Name` <chr> "NAIN", "NAIN", "NAIN", "NAIN", "NAIN", "…
## $ `Climate ID` <chr> "8502799", "8502799", "8502799", "8502799…
## $ `Date/Time` <chr> "2004-11", "2004-12", "2005-01", "2005-02…
## $ Year <dbl> 2004, 2004, 2005, 2005, 2005, 2005, 2005,…
## $ Month <chr> "11", "12", "01", "02", "03", "04", "05",…
## $ `Mean Max Temp (°C)` <dbl> -0.2, -9.7, -18.4, -8.9, -5.2, 2.3, 7.8, …
## $ `Mean Max Temp Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Mean Min Temp (°C)` <dbl> -6.7, -16.9, -25.4, -19.8, -14.5, -5.7, -…
## $ `Mean Min Temp Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Mean Temp (°C)` <dbl> -3.4, -13.3, -21.9, -14.4, -9.9, -1.7, 3.…
## $ `Mean Temp Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Extr Max Temp (°C)` <dbl> 5.1, -1.0, -9.7, 3.8, 4.0, 8.5, 21.3, 24.…
## $ `Extr Max Temp Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Extr Min Temp (°C)` <dbl> -13.3, -27.1, -30.8, -29.2, -30.4, -14.9,…
## $ `Extr Min Temp Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Rain (mm)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Rain Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Snow (cm)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Snow Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Total Precip (mm)` <dbl> 80.7, 88.0, 42.5, 119.8, 145.8, 44.2, 61.…
## $ `Total Precip Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Snow Grnd Last Day (cm)` <dbl> 26, 24, 18, 51, 45, 26, 0, 0, 0, 0, 0, NA…
## $ `Snow Grnd Last Day Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Dir of Max Gust (10's deg)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Dir of Max Gust Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Spd of Max Gust (km/h)` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Spd of Max Gust Flag` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…