Intro
Dashboards are a fantastic way to condense a large amount of data into a small and comprehensible display. Interactive dashboards are even better as the user can engage with the data to identify trends and patterns.
For this project let’s start by identifying the data that we will use for the dashboard. Luckily, New York provides several API’s for different Covid-19 related data. We are going to use the NYS Statewide testing data.
Getting the Data
Using the API is a good idea, since the dataset is updated everyday if we downloaded a CSV, it would be outdated after 24 hours.
In addition, the data has thousands of observations. If we pull the whole dataset at once we will be asking the NYS website for a very large file. Instead let’s only pull data for one date at a time. For this example let’s look at December 1, 2020.
Here’s what everything looks like put together…
library(httr)
library(RCurl)
library(tidyverse)
library(jsonlite)
path<-'https://health.data.ny.gov/resource/xdss-u53e.json' extra_str<-'T00:00:00.000'
desired_date<-'2020-12-01'
clean_date<-paste(desired_date,extra_str,sep='')
request <- httr::GET(path,query= list( test_date=clean_date)) response <- content(request, as = "text", encoding = "UTF-8")
DF <- fromJSON(response, flatten = TRUE) %>% data.frame()
This code makes a request for NYS data where the date of the Covid data is equal to the desired date. Since all dates in the data set begin with T00:00:00.000
we need to paste that string in before we make the request. This is handled by creating the clean_date
variable.
Let’s see what this data looks like…
glimpse(DF)
## Rows: 62
## Columns: 6
## $ test_date <chr> "2020-12-01T00:00:00.000", "2020-12-01T00:00:00.000", ...
## $ county <chr> "Albany", "Allegany", "Bronx", "Broome", "Cattaraugus"...
## $ new_positives <chr> "170", "11", "511", "103", "35", "13", "22", "31", "3"...
## $ cumulative_number_of_positives <chr> "6138", "1069", "65005", "5551", "1199", "930", "1631"...
## $ total_number_of_tests <chr> "2656", "338", "10712", "2123", "592", "573", "821", "...
## $ cumulative_number_of_tests <chr> "262872", "41255", "1305152", "228386", "61716", "7049.
It looks like for every day there is one record per county for new positive cases per day, cumulative cases, tests per day, and cumulative tests.
Refining the Dataframe
Let’s make our map show the number of cases per day. To do this we’ll just drop all the columns that aren’t county name or number of positive tests per day. This new dataframe will be called pos_tests
. Lastly, ggplot2
likes the county names to be lowercase (sorry I don’t make the rules) so let’s just take care of that here as well.
pos_tests <-DF[,c(2:3)]
pos_tests$county<-tolower(pos_tests$county)
Mapping New York
To create the actual map we will need a map of New York divided by county. Using ggplot2
we can get the data needed for this map. Let’s just rename the subregion
column to county
and remove columns 4 and 5 since we won’t be needing them. Also, don’t forget to keep those county names lowercase!
library(ggplot2)
ny_counties <- map_data("county","new york")
county_df <- dplyr::rename(ny_counties,county=subregion) county_df<-county_df[,-c(4:5)]
county_df$county<-tolower(county_df$county)
Joining the Dataframes
Now we have two dataframes. One with the Covid data per county and one with the data to map each county. Let’s join them using the full join command from the dplyr
package. This is the last dataframe we are making so I’m going to call it final_df
. Also, we need to change the data type of the number of positive tests to numeric.
final_df<- dplyr::full_join(county_df,pos_tests,by='county') final_df$new_positives<-as.numeric(final_df$new_positives)
Putting it all together with ggplot2
Now that all the hard work is over with we can begin the mapping. Let’s break out ggplot()
and bring this all together.
title<-paste('Daily Positive Covid Tests on',desired_date)
nys_map<-ggplot(final_df)+ aes(long,lat, group=group)+
geom_polygon(aes(fill=new_positives))+
scale_fill_gradient(low = 'grey', high = 'red', name = 'New Positives')+
coord_map()+xlab('')+ylab('')+theme_minimal()+
ggtitle(title)
nys_map
Gotta love it when a plan comes together. Yates county seems to have some reporting issues, but we will leave that alone for now. Otherwise, we have successfully created a map to track Covid-19 cases.
BUT WAIT! I know what you’re thinking, “Hey this guy said this would be an interactive map.” Yes and I have definitively saved the best for last, because for those of you who do not know how to make a ggplot interactive it is my pleasure to introduce you to the plotly
package.
library(plotly)
ggplotly(nys_map)
Wow! Look at that, we are now able to zoom in and hover over counties to see their individual data.
This is a good place to end this post, but stay tuned for part two where we’ll incorporate this map into a full dashboard with a friendly UI.
Categories: Uncategorized
Wow this was so informative and impressive
Thank you Joseph! Stay tuned for part 2
Good stuff! I like it! Can you make a map with a prediction if in the next weeks the cases will increase or decrease? That would be awesome!