Libraries used for this project:
# SPDS
library(tidyverse)
library(sf)
# Data
library(USAboundariesData)
library(USAboundaries)
library(readxl)
# Visualization
library(rmapshaper)
library(gghighlight)
library(knitr)
library(leaflet)
## [1] 3229
## [1] 256
By simplifying the complexity of the geometry, we were able to go form 3229 points to 256 points. The ms_simply function reduced the original geometry by 2973 points. By doing this, the computer has less to process and therefore increases the computation speed when running the code.
Type | Elements | Mean Area (km2) | Standard Deviation Area (km2) | Coverage Area |
---|---|---|---|---|
Counties | 3,108 | 2,521.745 | 3,404.325 | 7,837,583 |
Voroni | 3,107 | 2,521.886 | 2,885.325 | 7,835,500 |
Triangulation | 6,195 | 1,252.315 | 1,576.670 | 7,758,091 |
Grid | 3,108 | 2,728.126 | 0.000 | 8,479,014 |
Hexagon | 2,271 | 3,763.052 | 0.000 | 8,545,891 |
When looking at the traits for each tessellation, it is clear that the the square grid and hexagon tessellations have standard deviation areas of zero. This is due to the fact that both of these tessellations split the continental United States into equal area tiles of either squares or hexagons. Another thing to notice when looking at the chart above is that the mean area for the triangulation tessellation is the smallest. This is due to the fact that this tessellation has the greatest number of tiles which gives it the smallest mean area as well as a smaller standard deviation area than voroni or the original counties. However, using the triangulation tessellation may become a problem when computing point in polygon purely because of the number of elements. The higher the number of elements, the longer the computer will take to process all of them.
The use of different tessellations can have a large impact on how the data is being represented in a visualization such as a map of the United States. By definition, a MAUP is a modifiable areal unit problem which is a source of statistical bias that can significantly impact the results of statistical hypothesis tests. For example, while the triangulation tessellation has a higher number of elements so it can display the elements with a higher number of points better, the tiles do not have equal area so it makes sense that larger tiles will have more points in them. For that reason, I will be choosing to use the hexagonal tessellation due to the fact that all the elements are of equal size so it represents the number of dams per tile in a better fashion.
Out of all the types of dams, I chose four that interested me the most. I chose dams with the purpose of fire protection, water supply, flood control, and navigation. I wanted to be able to see if the placement of these types of dams made sense in terms of their location. For example, I was interested to see if the dams with the purpose of fire protection were located in areas that are prone to having wildfires.
When looking at the geographic distribution of these various types of dams, a lot of their locations make sense to me but many do not. Since I chose a tessellation that has tiles of equal area, areas with more dams are clustered towards the center of the US. This correlates to a lot of my findings as dams with purposes of fire protection, water supply, and flood control all had clusters in the middle of the continental United States. However, the placement of fire control dams confused me a bit as the majority of them are placed in the central while it is widely known that California has the most wildfires per year in the US. Secondly, I was lost as to why many of the dams used for flood control were in the center of the US as well. Does Texas, Kansas, and the states above have issues with flooding? The map of dams used for navigation made the most sense as they seem to be clustered along the Mississippi River, the main waterway that runs through the United States.