Here we shall learn how to:
- import and plot different types of maps, either at country or regional levels
- import distribution data and plot them onto the maps with climate data
- generate species and genus richness maps

1 Initial things to do

1.1 Set working directory

This is your own working directory, on your PC, update path below

setwd("C:/Users/couvreur/Documents/Gits/IUCN_palms")

1.2 Load needed libraries

optional: Install them first if not done already

install.packages(c("rnaturalearthdata",
                   "rgbif", "raster", "rnaturalearth", "tidyverse", "maptools", "ggmap", "tidyr", "rworldxtra", "dplyr", "scico", "knitr")

Load the libraries

library(ggmap)
library(ggplot2)
library(sf)
library(dplyr)
library(rnaturalearth)
library(rworldxtra)
library(tidyr)
library(scico)
library("raster")
library("knitr")

2 Plot country or regional maps

Select what country you want here. We do this here so we do not have to enter it everywhere afterwards. We also add the three letter code because it will be used below too. Three letter codes can be found here: 3_letters

country = "Ecuador"
country_3 = "ECU"

Or instead of giving a country, you can select the region you want, using coordinates in lat long. You can get coordinates here: link

xlat = c(-81, -74)
ylong = c(-5, 10)

We can plot maps at two levels of precision in terms of countries: Low and high.
Maps are plotted using ggplot2, see after.

2.1 Low precision maps

  1. Import a lower definition map from rnaturalearth package
worldMap <- ne_countries(scale = "medium", type = "countries", returnclass = "sf")
  1. Filter the country you want, creates a NCpoly_low object
CNpoly_low <- worldMap %>% filter(sovereignt == country)
  1. Simple plot of the map using plot() function
plot(CNpoly_low$geometry)

2.2 High presicion maps

The low precision map above is fine for general things, but the coastline and islands are not precise enough for pour type of data. Here we shall import a more precise one

  1. Import a higher definition map from rworldxtra package

This is part of the package. We need to transform it to a sp format after using st_as_sp function. This produces a CNhigh object

data(countriesHigh)
CNhigh = st_as_sf(countriesHigh)
  1. Filter the country you want, creates a NCpoly object
CNpoly <- CNhigh %>% filter(ADMIN == country)

2.3 High precision map with administration levels

In the above map, we only have the country. Below you can extract a high presicion map with admin levels. The country is defined by it three letter code, which was set above. The getData function from the raster package allows us to get all sorts of data. Here we specifiy the country and the admin level we want (1=regions)
country_3 variable is defined above (the three letter code of a country). This will download some files.

x <- getData('GADM', country=country_3, level=1)
x_proj = spTransform(x, CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

2.4 Plot country map with ggplot2

Plot the high definition country filtered above.

First, we set some extra limits around the country, so that it isn’t too tight. You can change the region around the country with dist. Here at 0.7.
This creates a limsCN object

limsCN <- st_buffer(CNpoly, dist = 0.7) %>% st_bbox

Plot map with ggplot2. This package has many different options that can be changed.
Here for the country without admin CNpoly

divpolPlot <-
  ggplot() +
  geom_sf(data = CNpoly) +
  coord_sf(
    xlim = c(limsCN["xmin"], limsCN["xmax"]),
    ylim = c(limsCN["ymin"], limsCN["ymax"])
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )
divpolPlot

2.4.1 We can crop around mainland Ecuador

Ecuador also includes the Galapagos, but, as you can see above, it makes a large map. Here we are not interested in the Galapagos. We can crop to specific regions if we want. Here we shall crop just to show mainland Ecuador.
You have to add the extent manually.
For the species/genus richness maps we shall use this cropped map.

Here we shall just crop to Ecuador, but you could crop to just the coast for exemple.

out <- st_crop(CNpoly, xmin = -81, xmax = -74, ymin = -5, ymax = 10)

Transform this to an sf object (sf is a sort of R spatial package):

ne = out$geometry
st_make_valid(ne)
## Geometry set for 1 feature 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -81 ymin: -5.007451 xmax: -75.22726 ymax: 1.434177
## CRS:            +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
ne = st_as_sf(ne)

Redefine the limits again:

limsNE <- st_buffer(ne, dist = 0.7) %>% st_bbox

Plot again:

divpolPlot <-
  ggplot() +
  geom_sf(data = ne) +
  coord_sf(
    xlim = c(limsNE["xmin"], limsNE["xmax"]),
    ylim = c(limsNE["ymin"], limsNE["ymax"])
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )
divpolPlot

## Plot regional map with ggplot2

Our dataset contains records from the Choco region. So we shall plot our records in the Choco region. Here we shall indicate a region using coordinates, rather than filtering for a country.
We use the CNhigh object.
Here coordinates for Ecuador cropped CAREFULL: xlat and ylong are defined above.

divpolPlot <-
  ggplot() +
  geom_sf(data = CNhigh) +
  coord_sf(
    xlim = xlat,
    ylim = ylong
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank(),
    axis.text.x = element_text(size = 6) # you can adjust the size of the x axis here
  ) + 
  labs(x='Longitude',y='Latitude',
       title="A detailed map",
       subtitle='looks good!',
       caption='Source: *raster* package GADM data')
divpolPlot

2.5 Add some climatic data to the plots

Here we shall download data from WorldClim and plot it with the map.

Download or import climatic data from WorldClim

If already downloaded and stored on your PC locally:

climate=getData('worldclim', var='bio',res=2.5, download=FALSE)

Otherwise (this step can take some time (ca. 123 Mb of data)):

climate=getData('worldclim', var='bio',res=2.5)

Crop to region of interest

Here you need to use the same lat long values as defined above for the region of interest. In the extent argument: 1: lat lower 2: lat upper 3: long left 4: long right

climate <- crop(climate,extent(-81, -73,-5, 11))

Raster the layer you want, here bio12

Here we extract values for bio12 variable, which is Annual Precipitation.
Other variables can be chosen here if you want:
- BIO12 = Annual Precipitation
- BIO1 = Annual Mean Temperature
- BIO15 = Precipitation Seasonality (Coefficient of Variation)

raster <- climate$bio12

Clean na values. We shall get rid of non existent values (NA):

rasdf <- as.data.frame(raster,xy=TRUE)%>%drop_na()

We can plot this climatic map:

ggplot()+
  geom_raster(aes(x=x,y=y,fill=bio12),data=rasdf) +
  geom_sf(fill='transparent', data = CNhigh) +
  coord_sf(
    xlim = xlat,
    ylim = ylong) +
  scale_fill_viridis_c('mm/yr',direction = -1) +
  labs(x='Longitude',y='Latitude',
       title="Mean annual precipitation",
       subtitle='Annual precipitation',
       caption='Source: WorldClim, 2020')+
  cowplot::theme_cowplot() +
  theme(panel.grid.major = element_line(color = gray(.5),
                                        linetype = 'dashed',
                                        size = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill=NA,color = 'black'),
        panel.ontop = TRUE,
        axis.text.x = element_text(size = 6) # you can adjust the size of the x axis here
        )

3 Plot species distribution data

3.1 Import original raw occurence dataset and clean

!Update paths and file name to your own!
Make sure what separator you have in your file, otherwise you will have error. best to use csv with ; as separator Here we shall import the Ecuador database, and filter it to include only Ecuador records (excluding other islands), but we shall keep the indet records for now.
When we import the records we shall use na.strings so that empty cells are flagged as NA. Finally, make sure that all records have a ddlat and ddlong value, otherwise there will be errors.

df <- read.csv("data/occs_cleaned_palm_GBIF_COU_QCA_20102021.csv", header = TRUE, sep = ";", stringsAsFactors = FALSE, na.strings = c("", "NA"))

df will contain all the records, inlcusing non identified specimens that we cannot use for diversity analyses. We shall drop those here and save it as df_id which will be used for the diversity analyses afterwards

df_id = df[!(is.na(df$genus) | is.na(df$species)), ]

Transform to SF object

It needs to be converted in to geosptial version with correct projection (here 4326 (WGS84)) see here: https://www.youtube.com/watch?v=2UZKm2Kc88U

df_id_sf = st_as_sf(df_id, coords = c(x = "ddlong", y = "ddlat"), crs = "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")

The text below is automatically generate in Rmarkdown.

Our database contains 4191 unique records representing 26 genera and 102 species.

3.2 Plot distribution data on regional map without climatic data

Here we shall plot all records in our database, identified or not (thus using the df object)
We have now added a geom_point argument to the ggplot script:

geom_point(data=df, aes(x=ddlong, y=ddlat), bg = rgb(red = 1, green = 0, blue = 0, alpha = 0.5), col = "black", pch = 21, cex=2) +

Color and shape of points can be changed in the script. In our file, the columns ddlat and ddlong indicate the coordinates.

divpolPlot <-
ggplot() +
  geom_sf(data = CNhigh) +
  geom_point(data=df, aes(x=ddlong, y=ddlat), bg = rgb(red = 1, green = 0, blue = 0, alpha = 0.5), col = "black", pch = 21, cex=2) +
  coord_sf(
    xlim = xlat,
    ylim = ylong
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank(), 
    axis.text.x = element_text(size = 6), # you can adjust the size of the x axis here
    plot.title = element_text(size = 10, hjust = 0.5) # you can adjust the size of main title here
                 ) +
    labs(
      x='Longitude',y='Latitude',
    title="Distribution of palm records \n in the Choco region",
    caption='Source: GBIF, QCA, Couvreur'
    )
    
    divpolPlot

3.3 Plot distribution data on regional map with climatic data

This ggplot script is different than the one above as it adds the climate data.

ggplot()+
  geom_raster(aes(x=x,y=y,fill=bio12),data=rasdf)+
  geom_sf(fill='transparent',data=CNpoly)+
  scale_fill_viridis_c('mm/yr',direction = -1)+
  coord_sf(expand=c(0,0))+
  geom_point(data=df, aes(x=ddlong, y=ddlat), bg = rgb(red = 1, green = 0, blue = 0, alpha = 0.5), col = "black", pch = 21, cex=1) +
  coord_sf(
    xlim = xlat,
    ylim = ylong
  ) +
  labs(x='Longitude',y='Latitude',
       title="Plam distribution \n in Choco's climate map",
       subtitle='Annual precipitation',
       caption='Source: WorldClim, 2020')+
  cowplot::theme_cowplot()+
  theme(panel.grid.major = element_line(color = gray(.5),
                                        linetype = 'dashed',
                                        size = 0.5),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill=NA,color = 'black'),
        axis.text.x = element_text(size = 6), # you can adjust the size of the x axis here
        plot.title = element_text(size = 13, hjust = 0.5), # you can adjust the size of main title here
        panel.ontop = TRUE)

4 Plot species richness

4.1 Prepare species distribution data

Here we shall just select the columns we what to use.
We shall use the df_id object here because we do not want undetermined records.
We shall create a new object with just the three columns we need: df_1.

Create a new dataframe with just species, ddlong and ddlat:

df_1 = df_id[c("genus", "species", "ddlat","ddlong")]

Transform to SF object

It needs to be converted in to geosptial version with correct projection (here 4326 (WGS84)) see here: https://www.youtube.com/watch?v=2UZKm2Kc88U

df_sf = st_as_sf(df_1, coords = c(x = "ddlong", y = "ddlat"), crs = 4326)

4.2 Create a grid

Here we shall create a grid (cells) and then calculate the number of species in each. This provides a view of species diversity.
As we are interested in coastal Ecuador we shall filter the file to have just the coastal regions of Ecuador.

Here we shall use the cropped Ecuador region to create the grid ne object created above.

Choose grid size in minutes (normally between 0.5 and 2)

w = 0.25

Create grid around the region of interest

CNGrid <- ne %>%
  st_make_grid(cellsize = w) %>%
  st_intersection(ne) %>%
  st_sf() %>%
  mutate(cellid = row_number())

4.3 Optional: Plot empty grid just to see what it looks like

gridPlot <-
  ggplot() +
  geom_sf(data = ne) +
  geom_sf(data = CNGrid) +
  coord_sf(
    xlim = c(limsNE["xmin"], limsNE["xmax"]),
    ylim = c(limsNE["ymin"], limsNE["ymax"])
  ) +
  scale_x_continuous(breaks = c(-84)) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )
gridPlot

4.4 Plot species richness analysis

Here we shall count the number of species per grid

4.4.1 Make richness grid

Here we shall first count the number of species per grid and save that in the richness_grid object.
we use dplyr package for this. %>% are pipes, that is we compile different functions one after the other.
First we overlap the distribution data with the grid. This produces a table where we have species per cell. Then we group by cellid and species, and count number of unique species per grid (tally), then we ungroup ad group again on cell id and count number of species per cellid.

richness_grid <- CNGrid %>% st_join(df_id_sf) %>%
  group_by(cellid, species) %>%
  tally() %>%
  filter(!is.na(species)) %>%
  ungroup() %>%
  mutate(overlap = ifelse(!is.na(species), 1, 0)) %>%
  group_by(cellid) %>%
  summarize(num_species = sum(overlap))

4.4.2 Plot richness

Now we can plot the grid with the species richness.
We use the scico package for the graded scale bar. This can be changed to other colors if needed, see scico.
In the script below, we indicate that grids with NA values should be transparent, so they will appear in grey.
For the tile we create a chart_title_sp object so we can automatically add in the size of the grids (variable w):

chart_title_sp <- paste("Species richness for cells of",w ,"degrees")

gridRichCN <-
  ggplot(richness_grid) +
  geom_sf(data = ne, fill = "grey", size = 0.1) +
  geom_sf(aes(fill = num_species), color = NA) +
  scale_fill_scico(palette = "lajolla", end = 0.9, na.value = "transparent") +
  coord_sf(
    xlim = c(limsNE["xmin"], limsNE["xmax"]),
    ylim = c(limsNE["ymin"], limsNE["ymax"])
  ) +
    scale_x_continuous(breaks = waiver()) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank(),
    axis.text.x = element_text(size = 7)
  ) + 
  labs(x='Longitude',y='Latitude',
       title = chart_title_sp,
       fill='Number of species')
gridRichCN

4.4.3 Plot genus richness

Finally, we can do the same thing for the number of genera. We simply change the counting on the genus column in the script below:

richness_grid_gen <- CNGrid %>% st_join(df_sf) %>%
        group_by(cellid, genus) %>%
        tally() %>%
        filter(!is.na(genus)) %>%
        ungroup() %>%
        mutate(overlap = ifelse(!is.na(genus), 1, 0)) %>%
        group_by(cellid) %>%
        summarize(num_gen = sum(overlap))

Now we can plot the genus richness: For the tile we create a chart_title_gn object so we can automatically add in the size of the grids (variable w):

chart_title_gn <- paste("Genus richness for cells of",w,"degrees")

gridRichCN_gen <-
  ggplot(richness_grid_gen) +
  geom_sf(data = CNpoly, fill = "grey", size = 0.1) +
  geom_sf(aes(fill = num_gen), color = NA) +
  scale_fill_scico(palette = "lajolla", end = 0.9, na.value = "transparent") +
  coord_sf(
    xlim = c(limsNE["xmin"], limsNE["xmax"]),
    ylim = c(limsNE["ymin"], limsNE["ymax"])
  ) +
  scale_x_continuous(breaks = waiver()) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "#2F4051"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank(),
    axis.text.x = element_text(size = 7)
  ) +  
  labs(x='Longitude',y='Latitude',
       title = chart_title_gn,
       fill='Number of genera')
gridRichCN_gen