R Programming Language

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis.

Show Only ...
Maps - Photos - Videos

There was a time when PHP was my go to language for everything

Sure, I used other languages – Javascript, Python, C and even C++ but I really liked PHP for there being so many libraries and it being so easy to tie into web apps.

Python was fine but it was something that I used when choices were limited – like for QGIS scripting and automation. Then over summer vacation I got that book out of the library about data science and discovered the benefits of an interactive intrepter with Jupyter. Python’s data frames model with pandas is quite powerful. I enjoy working with them. But I also found matplotlib, the graphing and mapping libraries for Python to be often lacking when trying to output quality, professional looking graphics.

I heard a lot of good things about R so I decided to take it up last autumn, and I haven’t looked back…

R has a werid syntax but the pipe model that can be used throughout is incredibly powerful and ggplot2 with only a little bit of tweaking makes amazing graphs and maps that look like they came out of a professional GIS program. Things I used to do in QGIS, I’m increasingly doing with the R programming language and I even converting a lot of my PHP and Python code now over to R. I am really hooked on this language after struggling with it a bit at first. 

Why Learn R Programming Language?

R is weird, if you are coming from other programming languages, especially those that come out of the C tradition — which is the most common base for languages — be it Perl, Python, Basic or any of C languages. Some of the operators are weird, the function names non-obvious, the arrays start with 1 rather 0.

Some of the functions are cumbersome to type too — the dpylr pipe as %>% can be annoying to type repeatably — which is why RStudio contains shortcuts to speed programming. Likewise, the same can be said about the <- assignment operator, which also is obnoxious to type compared to what most other languages use.

But what makes great R pretty awesome is it is an actually quite compact language for creating graphs, charts and even maps due to the pipe mechanism and many very powerful, well designed libraries. The libraries are also easy to explore and understand — you can call R functions without parameters and if they aren’t compiled C code, will output the code that makes up the function.

Matplotlib is powerful in Python, but it really isn’t as fast and easy to use ggplot2 and the grammar of geometry. Matplotlib does many things good, but it’s a lot more fiddly and the labeling functions don’t work all that well without writing a bunch of your own code. There is plotnine for Python which attempts to bring the best of ggplot2 to Python, but I find a lot of the best functions in R are missing. So it kind of sucks.

Calcuating Miles of the Speed Limit on Touring Routes

From the department of neat things you can do with a few lines of R code.

library(tidyverse)
library(arcpullr)

arcpullr::get_spatial_layer('https://gis.dot.ny.gov/hostingny/rest/services/Roadways/RoadwayData/FeatureServer/1') %>%
  filter(Posted_Speed_Limit_MPH > 10) %>%
  mutate(Length = st_length(geoms)) %>%
  st_drop_geometry() %>%
  group_by(Posted_Speed_Limit_MPH) %>%
  summarise(Miles = sum(Length) %>% units::set_units('mi') %>% units::drop_units())

Need a quick Census TIGER/Line Shapefile for a map you are making? πŸ—Ί

Need a quick Census TIGER/Line Shapefile for a map you are making? πŸ—Ί

It is super easy to get using R and tigris and sf libraries. I often will run a command like this in R terminal to get a Shapefile of the of the counties in Maine.

library(sf)
library(tigris)

counties(‘me’, cb=T) %>% write_sf(‘/tmp/maine.shp’)

Other common tigris functions I will use is state, county_subdivision, tract, block_group and school_districts which work similarly. A resolution parameter can be supplied to control the resolution downloaded, the cb=T flag obtains cartographic boundaries which follow coast lines, rather then actual boundaries.

How to access WMS Servers in R Programming Language

How to access WMS Servers in R Programming Language

I couldn’t find a good package to read WMS data into a SpatialRaster or for downloading. However I discovered that you can use sf’s built-in version of gdal_util query the WMS server’s information then use the sf built in gdal_translate to download the WMS image into a temporary file then load it into memory.

library(sf)
library(tidyverse)
library(terra)
rm(list=ls())

# obtain layer info from WMS server using sf built-in version of gdal_info
wms_url <- 'https://orthos.its.ny.gov/ArcGIS/services/wms/Latest/MapServer/WMSServer?'
ginfo <- sf::gdal_utils('info', str_c('WMS:',wms_url), quiet=T)

# extract layers and layer urls from returned layer info, 
# create a data table from this data
ldesc <- ginfo %>% str_match_all('SUBDATASET_(\\d)_DESC=(.*?)\n')
ldsc <- ldesc[[1]][,3]
lurl <- ginfo %>% str_match_all('SUBDATASET_(\\d)_NAME=(.*?)\n')
lurl <- lurl[[1]][,3]
wms_layers <- cbind(ldesc, lurl)
rm(ldesc, lurl)

# obtain from census bureau a shapefile of the Town of Cazenovia,
# then find the bounding box around it
bbx<- tigris::county_subdivisions('ny') %>% 
  filter(NAME == 'Cazenovia') %>% 
  st_transform(3857) %>% 
  st_bbox()

# Revise the WMS URL based on the above bounding box
# Web Mercator (3857) is best to use with WMS Servers as
# that is the native projection. As I wanted multiple layers
# I also revised the URL string to include all layers desired
# Seperated by a comma (explore the wms_layer table for details)
url <- wms_layers[4,2] %>%
  str_replace('LAYERS=(.*?)&', 'LAYERS=3,2,1,0&') %>%
  str_replace('SRS=(.*?)&', 'SRS=EPSG:3857&') %>%
  str_replace('BBOX=(.*?)$', str_c('BBOX=',paste(bbx, collapse=',')) ) 

# Use sf built-in gdal_utils to download the image of Cazenovia based
# on the URL, with an output size listed below. You can also download
# based on resolution by using the tr option that is commented out below.
# then load the temporary file into rast as a Spatial Raster for further 
# processing. In addition, adding -co COMPRESS=JPEG greatly reduces the size
# of aerial photography with minimal impacts on quality or loading speed.

t <- tempfile()
reso <-  c('-outsize','1920','1080','-co','COMPRESS=JPEG')
#reso <- c('-tr', '1','1','-co','COMPRESS=JPEG') # "meters" in the 3857 projection
sf::gdal_utils('translate', url, t, reso) 
r <- rast(t)

# display the spatial raster or whatever you would like to do with it
# the tiff is stored in the temporary location in the t variable
plot(r)

How to fix rselenium / wdman unable to start error

If you are getting an error starting geckodriver when using rselenium, you may want to try deleting the LICENSE.chromedriver file which accidentally attempts to be executed by wdman::geckodriver. You can do this simply in the Linux terminal by using this command. The xargs -r command only executes the rm command when there is a file matched to delete.

find ~/.local/share/ -name LICENSE.chromedriver -print | xargs -r rm

R Programming – IDW interpolation of Missing / NA Census Tract Data

You can do IDW interpolation of missing Census Tracts fairly easily in R using the gstat library. The key is to make sure you use a projected dataset. Other interpolation methods are covered here: https://rspatial.org/raster/analysis/4-interpolation.html

library(tidycensus)
library(tidyverse)
library(raster)
library(gstat)

# obtain census data on veteran status by tract and then
# reproject the shapefile geometery into a projected coordinate system

acs <- get_acs("tract",
state='ny',
survey='acs5',
var=
c('Total'='B21001_001',
'Veteran'='B21001_002'
),
cache_table = T,
geometry = T,
resolution='20m',
year = 2020,
output = "wide"
) %>% st_transform(26918)

# calculate the percentage of veterans per census tract
acs <- mutate(acs, vet_per = VeteranE/TotalE)

# create a copy of census tracts, dropping any NA values
# from vet_per field
vetNA <- acs %>% drop_na(vet_per)

# a raster should be created to do interpolation into
r <- raster(vetNA, res=1000)


# set the foruma based on field (vet_per) that contains
# the veterans percent to interpolate. This use IDW interpolation
# for all points, weighting farther ones less
gs <- gstat(formula = vet_per~1, locations = vetNA)

# interpolate the data (average based on nearby points)
nn <- interpolate(r, gs)

# extract the median value of the raster interpolation from the original shapefile,
# into a new column set as est
acs<- cbind(acs, est=exactextractr::exact_extract(nn, acs, fun='median'))

# replace any NA values with interpolated data so the map doesn't contain
# holes. You should probably mention that missing data was interpolated when
# sharing your map.
acs <- acs %>% mutate(vet_per = ifelse(is.na(vet_per), est, vet_per))