TUTORIAL 2 -Customized training database
Marco Pittarello and Davide Barberis
Source:vignettes/TUTORIAL_2_-_Customized_training_database.Rmd
TUTORIAL_2_-_Customized_training_database.Rmd
If the study area is beyond the Piedmont Region (North-Western
Italy), it is advisable to provide a customized training
database (see “Introduction to ResNatSeed” vignette for
definitions) through the trainingDB
function
before using the RestInd
function.
In order to be processed with the trainingDB
function, the vegetation and topographical variables
database (see “Introduction to ResNatSeed” vignette for
definitions) must be structured as follows (see Table 1 for an
example):
- Rows of the dataframe are the surveys
- The first column reports the survey codes
- The second, third and fourth columns report the topographical
variables, respectively:
- Elevation: expressed in meters above sea level (m a.s.l.)
- Slope: expressed in degrees (°)
-
Aspect: expressed in degrees from North (°N) as it will be
converted to “southness” to avoid circular variable issues in
the models run with
RestInd
(Chang et al. 2004)
- From the fifth column onwards, the plant species names. Plant species names do not necessarily have to be coded, they can be left in full. Afterwards, abbreviation codes will be automatically created in CEP names format: an abbreviation of species names according to the Cornell Ecology Programs (CEP), which uses eight-letter abbreviations for species. The CEP names code will be used to formulate the seed mixture or donor grassland composition.
Table 1. Example of the structure of the
vegetation and topographical variables database to provide in
the function trainingDB
. This database includes
703 vegetation surveys and 532 plant species
#> # A tibble: 532 × 703
#> Cod_ril eleva…¹ slope aspect Nardu…² Festu…³ Carex…⁴ Festu…⁵ Trifo…⁶ Poa a…⁷
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 survey_… 1949. 23.9 244. 0 8.87 4.84 1.61 0 0
#> 2 survey_… 2589. 31.9 145. 5.17 0 12.9 0 0 4.31
#> 3 survey_… 2634. 44.1 240. 0 0 2.82 0 0 1.41
#> 4 survey_… 2593. 42.3 188. 0 0 18.3 0 0 0.962
#> 5 survey_… 2195. 30.3 284. 0 2.59 0 0 0 2.59
#> 6 survey_… 2105. 8.99 185. 0 23.0 0 0 0 15.9
#> 7 survey_… 2082. 31.0 235. 1.83 6.10 4.27 1.22 0.610 0
#> 8 survey_… 2078. 20.0 275. 0 8.33 3.70 0 0 4.63
#> 9 survey_… 1880. 5.24 170. 1.53 29.8 0 0 0 20.6
#> 10 survey_… 2053. 40.6 176. 0 1.53 0 0.763 0 0
#> # … with 522 more rows, 693 more variables: `Festuca paniculata` <dbl>,
#> # `Agrostis capillaris` <dbl>, `Brachypodium rupestre` <dbl>,
#> # `Festuca violacea aggr.` <dbl>, `Anthoxanthum odoratum aggr.` <dbl>,
#> # `Phleum rhaeticum` <dbl>, `Plantago aggr. alpina/serpentina` <dbl>,
#> # `Helianthemum nummularium` <dbl>, `Achillea millefolium aggr.` <dbl>,
#> # `Thymus serpyllum aggr.` <dbl>, `Avenella flexuosa` <dbl>,
#> # `Ranunculus montanus aggr.` <dbl>, `Sesleria caerulea` <dbl>, …
At this point, the customized training database can be generated by
specifying the name of the vegetation and topographical variables
database in the data
argument of the
trainingDB
function. Through this function it is
possible to select the species eligible for the statistical modeling
through two arguments:
spe.freq
: this argument allows to set a threshold of minimum frequency of each species in the surveys. It is advisable to set values greater than or equal to 30 to allow appropriate statistical modeling.min.spe.abundance
: this argument allows to set a threshold of minimum abundance (greater than) of each species in each survey. WhenNULL
the parameter is set to 0.
training.custom<-trainingDB(data=veg.composition.example,
spe.freq = 30,# only species occurring at least in 30 surveys will be retained
min.spe.abundance = 1# only species with at least 1% of relative abundance will be retained
)
#> [1] "A list containing two dataframes has been created:"
#> [1] "1 - cep.names: database with the list of species suitable for modeling and their species codes (CEP names) to be used for the seed mixture or donor grassland composition"
#> [1] "2 - trainingDB.ResNatSeed: database to use in the 'RestInd' function"
Through trainingDB
function, two dataframes are
created and stored in a list, named in this example as
‘training.custom’. The two dataframes are:
-
cep.names
: dataframe with the list of species suitable for modeling and their species codes (CEP names) to be used for the seed mixture or donor grassland composition. In this example 65 species (out of 532) are eligible for modeling
training.custom$cep.names
#> species cep.names
#> 1 Achillea millefolium aggr. Achiaggr
#> 2 Agrostis alpina Agroalpi
#> 3 Agrostis capillaris Agrocapi
#> 4 Alchemilla vulgaris aggr. Alchaggr
#> 5 Alopecurus alpinus Alopalpi
#> 6 Antennaria dioica Antedioi
#> 7 Anthoxanthum odoratum aggr. Anthaggr
#> 8 Arnica montana Arnimont
#> 9 Avenella flexuosa Avenflex
#> 10 Brachypodium rupestre Bracrupe
#> 11 Briza media Brizmedi
#> 12 Campanula scheuchzeri Campsche
#> 13 Carduus defloratus Carddefl
#> 14 Carex sempervirens Caresemp
#> 15 Carlina acaulis Carlacau
#> 16 Centaurea uniflora Centunif
#> 17 Cerastium arvense Ceraarve
#> 18 Crocus albiflorus Crocalbi
#> 19 Cruciata glabra Crucglab
#> 20 Dactylis glomerata Dactglom
#> 21 Dianthus pavonius Dianpavo
#> 22 Festuca ovina aggr. Festaggr
#> 23 Festuca violacea aggr. Festaggr.1
#> 24 Festuca paniculata Festpani
#> 25 Festuca quadriflora Festquad
#> 26 Festuca rubra Festrubr
#> 27 Galium lucidum aggr. Galiaggr
#> 28 Gentiana acaulis aggr. Gentaggr
#> 29 Geum montanum Geummont
#> 30 Helianthemum nummularium Helinumm
#> 31 Hieracium glanduliferum Hierglan
#> 32 Hieracium pilosella Hierpilo
#> 33 Juncus trifidus Junctrif
#> 34 Leontodon helveticus Leonhelv
#> 35 Leontodon hispidus Leonhisp
#> 36 Lotus corniculatus Lotucorn
#> 37 Luzula lutea Luzulute
#> 38 Nardus stricta Nardstri
#> 39 Onobrychis montana Onobmont
#> 40 Phleum rhaeticum Phlerhae
#> 41 Phyteuma betonicifolium Phytbeto
#> 42 Plantago aggr. alpina/serpentina Planserp
#> 43 Poa alpina Poaalpi
#> 44 Poa pratensis Poaprat
#> 45 Poa variegata Poavari
#> 46 Polygonum bistorta Polybist
#> 47 Polygonum viviparum Polyvivi
#> 48 Potentilla crantzii Potecran
#> 49 Potentilla grandiflora Potegran
#> 50 Ranunculus acris Ranuacri
#> 51 Ranunculus montanus aggr. Ranuaggr
#> 52 Ranunculus kuepferi Ranukuep
#> 53 Salix herbacea Saliherb
#> 54 Sesleria caerulea Seslcaer
#> 55 Taraxacum officinale aggr. Taraaggr
#> 56 Thymus serpyllum aggr. Thymaggr
#> 57 Trifolium alpinum Trifalpi
#> 58 Trifolium pratense Trifprat
#> 59 Trifolium repens Trifrepe
#> 60 Trifolium thalii Trifthal
#> 61 Trisetum flavescens Trisflav
#> 62 Vaccinium gaultherioides Vaccgaul
#> 63 Vaccinium myrtillus Vaccmyrt
#> 64 Veronica allionii Veroalli
#> 65 Viola calcarata Violcalc
-
trainingDB.ResNatSeed
: training database to use in theRestInd
function
head(training.custom$trainingDB.ResNatSeed)
#> Codril elevation slope southness species cep.names
#> 1 survey_310 2093.494 39.45094 100.181328 Achillea millefolium aggr. Achiaggr
#> 2 survey_40 2446.180 33.13994 0.308533 Achillea millefolium aggr. Achiaggr
#> 3 survey_516 2042.679 29.37408 178.086090 Achillea millefolium aggr. Achiaggr
#> 4 survey_198 1630.240 22.73879 129.326096 Achillea millefolium aggr. Achiaggr
#> 5 survey_133 1945.885 12.64068 101.486450 Achillea millefolium aggr. Achiaggr
#> 6 survey_58 2435.455 26.73384 162.208801 Achillea millefolium aggr. Achiaggr
#> abundance
#> 1 0.000000
#> 2 0.000000
#> 3 0.000000
#> 4 2.586207
#> 5 8.982036
#> 6 0.000000
From the list of species in training.custom$cep.names
,
it is necessary to create a database containing the seed mixture or
donor grassland composition. The seed mixture or donor grassland
composition must be a dataframe with two columns:
- First column: species code abbreviated in CEP names format
- Second column: abundance of each species. Abundance must be a number bounded between 0 and 100, which can be either a species relative abundance or a species cover (sensu Pittarello et al. (2016); Verdinelli et al. (2022)].
In this example, the donor grassland composition is characterized by four species:
- Dactylis glomerata (CEP name: Dactglom), abundance: 25%
- Festuca ovina aggr. (CEP name: Festaggr), abundance: 35%
- Thymus serpyllum aggr. (CEP name: Thymaggr), abundance: 15%
- Lotus corniculatus (CEP name: Lotucorn), abundance: 8%
Total abundances amount to 83%. The total abundance of the seed mixture or donor grassland composition should not necessarily amount 100%.
We can generate the dataframe of the donor grassland composition:
donor.composition<-data.frame(
species=c("Dactglom","Festaggr","Thymaggr","Lotucorn"),
abundance=c(25,35,15,8)
)
donor.composition
#> species abundance
#> 1 Dactglom 25
#> 2 Festaggr 35
#> 3 Thymaggr 15
#> 4 Lotucorn 8
It is now possible to use the RestInd
function
to calculate the Suitability Index (SI) and
Reliability Index (RI) by means of the donor grassland
composition (dataframe ‘donor.composition’) and of the elevation, slope,
and aspect of restoration site:
- elevation: 1600 m a.s.l.
- slope: 10°
- aspect: 110° N
Remember that in the trainingDB
argument it
must be specified the one created above,
i.e. training.custom$trainingDB.ResNatSeed
RestInd(trainingDB = training.custom$trainingDB.ResNatSeed,
composition=donor.composition,
elevation=1600,
slope=10,
aspect=110
)
#> $DESCRIPTIVES
#> cep.names species n.obs min.ele max.ele min.slope
#> Dactglom Dactglom Dactylis glomerata 31 760 2254 1.8
#> Festaggr Festaggr Festuca ovina aggr. 39 1447 2756 1.5
#> Lotucorn Lotucorn Lotus corniculatus 45 1010 2756 3.8
#> Thymaggr Thymaggr Thymus serpyllum aggr. 39 985 2756 5.9
#> max.slope min.south max.south
#> Dactglom 45 18 173
#> Festaggr 47 6.9 173
#> Lotucorn 45 1.9 179
#> Thymaggr 45 6.9 179
#>
#> $SPECIES_ABUNDANCES
#> cep.names species PMA POA ratio R2.adj RMSE SmDgA EA
#> 1 Dactglom Dactylis glomerata 16.6 27.0 0.61 0.36 5.3 25 15.25
#> 2 Festaggr Festuca ovina aggr. 14.0 37.3 0.38 0.44 7.8 35 13.30
#> 3 Lotucorn Lotus corniculatus 2.8 22.9 0.12 0.61 3.2 8 0.96
#> 4 Thymaggr Thymus serpyllum aggr. 5.7 8.5 0.67 -0.06 3.6 15 10.05
#>
#> $INDEXES
#> SI RI
#> 1 0.48 1