R package of the week: corrr
This week we will have a look at the corrr package.
It includes some nice possibilities to visualize correlations between mutliple variables.
I will provide some examples using the varechem data set from the vegan package.
First, load the data and have a look at them.
data(varechem)
head(varechem)
## N P K Ca Mg S Al Fe Mn Zn Mo Baresoil Humdepth
## 18 19.8 42.1 139.9 519.4 90.0 32.3 39.0 40.9 58.1 4.5 0.3 43.9 2.2
## 15 13.4 39.1 167.3 356.7 70.7 35.2 88.1 39.0 52.4 5.4 0.3 23.6 2.2
## 24 20.2 67.7 207.1 973.3 209.1 58.1 138.0 35.4 32.1 16.8 0.8 21.2 2.0
## 27 20.6 60.8 233.7 834.0 127.2 40.7 15.4 4.4 132.0 10.7 0.2 18.7 2.9
## 23 23.8 54.5 180.6 777.0 125.8 39.5 24.2 3.0 50.1 6.6 0.3 46.0 3.0
## 19 22.8 40.9 171.4 691.8 151.4 40.8 104.8 17.6 43.6 9.1 0.4 40.5 3.8
## pH
## 18 2.7
## 15 2.8
## 24 3.0
## 27 2.8
## 23 2.7
## 19 2.7
As you can see, the data set contains different soil parameters like Nitrogen, Phosphorus or depth of the humus layer.
The basic function of the corrr package is correlate(), which works similar to base R’s cor().
corrr_table <- correlate(varechem, quiet = TRUE)
The main difference between cor() and correlate() is that the latter returns a tibble while the former returns a matrix.
In the call above, I set the quiet argument to TRUE.
This prevents the function from returning information on the correlation metric and the method of dealing with missing values.
Both options can be set explicitly with the method and use arguments.
Here, I used their default values (pearson correlation and only using pairwise complete observations).
The package uses a tibble instead of a matrix so the we can make use of all the tidyverse functions, like only showing terms with a correlation above 0.7 with zinc …
corrr_table |> filter(Zn > 0.7) |> pull(term)
## [1] "P" "Mg" "S"
… or only showing correlations of nirogen and sulphur …
corrr_table |> select(N, S)
## # A tibble: 14 x 2
## N S
## <dbl> <dbl>
## 1 NA -0.262
## 2 -0.251 0.753
## 3 -0.147 0.844
## 4 -0.271 0.540
## 5 -0.164 0.650
## 6 -0.262 NA
## 7 -0.0434 0.360
## 8 0.165 0.0565
## 9 0.0792 0.275
## 10 -0.132 0.710
## 11 -0.0577 0.432
## 12 0.106 0.0808
## 13 0.0760 0.158
## 14 -0.0421 -0.187
… or asking what the mean correlation of nitrogen and sulfur to all other variables is.
corrr_table |>
select(N,S) |>
map_dbl(~mean(., na.rm = T))
## N S
## -0.07261118 0.33921879
There are also some new data wrangling functions that corrr introduces.
shave() sets the lower or upper triangle to NA.
shave(corrr_table, upper = TRUE)
## # A tibble: 14 x 15
## term N P K Ca Mg S Al Fe Mn
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 N NA NA NA NA NA NA NA NA NA
## 2 P -0.251 NA NA NA NA NA NA NA NA
## 3 K -0.147 0.754 NA NA NA NA NA NA NA
## 4 Ca -0.271 0.737 0.665 NA NA NA NA NA NA
## 5 Mg -0.164 0.598 0.628 0.798 NA NA NA NA NA
## 6 S -0.262 0.753 0.844 0.540 0.650 NA NA NA NA
## 7 Al -0.0434 0.0453 0.119 -0.206 -0.118 0.360 NA NA NA
## 8 Fe 0.165 -0.128 -0.0941 -0.332 -0.202 0.0565 0.824 NA NA
## 9 Mn 0.0792 0.536 0.537 0.443 0.258 0.275 -0.470 -0.436 NA
## 10 Zn -0.132 0.702 0.600 0.678 0.708 0.710 -0.0551 -0.312 0.364
## 11 Mo -0.0577 0.172 0.0682 -0.157 0.0348 0.432 0.510 0.221 -0.205
## 12 Baresoil 0.106 0.0139 0.169 0.178 0.239 0.0808 -0.400 -0.457 0.246
## 13 Humdepth 0.0760 0.152 0.266 0.244 0.371 0.158 -0.494 -0.494 0.510
## 14 pH -0.0421 -0.0294 -0.233 0.0914 -0.0925 -0.187 0.418 0.440 -0.389
## # ... with 5 more variables: Zn <dbl>, Mo <dbl>, Baresoil <dbl>,
## # Humdepth <dbl>, pH <dbl>
We can rearrange the columns so that highly correlated columns are next to one another with rearrange(), which I will show below when we come to plots, because this is only relevant for plotting.
The focus() function is very similar to select(). The only difference is that the the term column is automatically selected in the focus() functions.
focus(corrr_table, N) |>
head()
## # A tibble: 6 x 2
## term N
## <chr> <dbl>
## 1 P -0.251
## 2 K -0.147
## 3 Ca -0.271
## 4 Mg -0.164
## 5 S -0.262
## 6 Al -0.0434
The last in the bunch is fashion() which can be used to create a nice looking version of the table: no leading zeros, NAs are replace by empty cells
fashion(corrr_table)
## term N P K Ca Mg S Al Fe Mn Zn Mo Baresoil
## 1 N -.25 -.15 -.27 -.16 -.26 -.04 .17 .08 -.13 -.06 .11
## 2 P -.25 .75 .74 .60 .75 .05 -.13 .54 .70 .17 .01
## 3 K -.15 .75 .66 .63 .84 .12 -.09 .54 .60 .07 .17
## 4 Ca -.27 .74 .66 .80 .54 -.21 -.33 .44 .68 -.16 .18
## 5 Mg -.16 .60 .63 .80 .65 -.12 -.20 .26 .71 .03 .24
## 6 S -.26 .75 .84 .54 .65 .36 .06 .27 .71 .43 .08
## 7 Al -.04 .05 .12 -.21 -.12 .36 .82 -.47 -.06 .51 -.40
## 8 Fe .17 -.13 -.09 -.33 -.20 .06 .82 -.44 -.31 .22 -.46
## 9 Mn .08 .54 .54 .44 .26 .27 -.47 -.44 .36 -.20 .25
## 10 Zn -.13 .70 .60 .68 .71 .71 -.06 -.31 .36 .28 .04
## 11 Mo -.06 .17 .07 -.16 .03 .43 .51 .22 -.20 .28 .03
## 12 Baresoil .11 .01 .17 .18 .24 .08 -.40 -.46 .25 .04 .03
## 13 Humdepth .08 .15 .27 .24 .37 .16 -.49 -.49 .51 .14 .06 .59
## 14 pH -.04 -.03 -.23 .09 -.09 -.19 .42 .44 -.39 -.09 -.17 -.53
## Humdepth pH
## 1 .08 -.04
## 2 .15 -.03
## 3 .27 -.23
## 4 .24 .09
## 5 .37 -.09
## 6 .16 -.19
## 7 -.49 .42
## 8 -.49 .44
## 9 .51 -.39
## 10 .14 -.09
## 11 .06 -.17
## 12 .59 -.53
## 13 -.72
## 14 -.72
While all these things are nice the really cool thing about corrr is the plots:
corrr_table |>
rearrange() |>
rplot()
## Registered S3 method overwritten by 'seriation':
## method from
## reorder.hclust vegan
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.

and
corrr_table |>
network_plot()

In the network plot highly correlated variables appear closer together and are joined by stronger (darker) paths.
The function includes an argument to exclude correlations below some threshold (min_cor), to change the color scale (colours), and lastly and a argument for whether arrows should be straight or curved (curved).
So we could modify the above plot to this:
corrr_table |>
network_plot(
min_cor = 0.5,
colours = c("green", "black", "blue"),
curved = FALSE
)

Yes, the first one was prettier but now you know what you can do with this.