Wrapper that trains models based spectral data to predict reference values and reports model performance statistics
test_spectra(
train.data,
num.iterations,
test.data = NULL,
pretreatment = 1,
k.folds = 5,
proportion.train = 0.7,
tune.length = 50,
model.method = "pls",
best.model.metric = "RMSE",
stratified.sampling = TRUE,
cv.scheme = NULL,
trial1 = NULL,
trial2 = NULL,
trial3 = NULL,
split.test = FALSE,
seed = 1,
verbose = TRUE,
wavelengths = deprecated(),
preprocessing = deprecated(),
output.summary = deprecated(),
rf.variable.importance = deprecated()
)
data.frame
object of spectral data for input into a
spectral prediction model. First column contains unique identifiers, second
contains reference values, followed by spectral columns. Include no other
columns to right of spectra! Column names of spectra must start with "X"
and reference column must be named "reference".
Number of training iterations to perform
data.frame
with same specifications as df
. Use
if specific test set is desired for hyperparameter tuning. If NULL
,
function will automatically train with a stratified sample of 70%. Default
is NULL
.
Number or list of numbers 1:13 corresponding to desired pretreatment method(s):
Raw data (default)
Standard normal variate (SNV)
SNV and first derivative
SNV and second derivative
First derivative
Second derivative
Savitzky–Golay filter (SG)
SNV and SG
Gap-segment derivative (window size = 11)
SG and first derivative (window size = 5)
SG and first derivative (window size = 11)
SG and second derivative (window size = 5)
SG and second derivative (window size = 11)
Number indicating the number of folds for k-fold cross-validation during model training. Default is 5.
Fraction of samples to include in the training set. Default is 0.7.
Number delineating search space for tuning of the PLSR
hyperparameter ncomp
. Must be set to 5 when using the random forest
algorithm (model.method == rf
). Default is 50.
Model type to use for training. Valid options include:
"pls": Partial least squares regression (Default)
"rf": Random forest
"svmLinear": Support vector machine with linear kernel
"svmRadial": Support vector machine with radial kernel
Metric used to decide which model is best. Must be either "RMSE" or "Rsquared"
If TRUE
, training and test sets will be
selected using stratified random sampling. This term is only used if
test.data == NULL
. Default is TRUE
.
A cross validation (CV) scheme from Jarquín et al., 2017.
Options for cv.scheme
include:
"CV1": untested lines in tested environments
"CV2": tested lines in tested environments
"CV0": tested lines in untested environments
"CV00": untested lines in untested environments
data.frame
object that is for use only when
cv.scheme
is provided. Contains the trial to be tested in subsequent
model training functions. The first column contains unique identifiers,
second contains genotypes, third contains reference values, followed by
spectral columns. Include no other columns to right of spectra! Column
names of spectra must start with "X", reference column must be named
"reference", and genotype column must be named "genotype".
data.frame
object that is for use only when
cv.scheme
is provided. This data.frame contains a trial that has
overlapping genotypes with trial1
but that were grown in a different
site/year (different environment). Formatting must be consistent with
trial1
.
data.frame
object that is for use only when
cv.scheme
is provided. This data.frame contains a trial that may or
may not contain genotypes that overlap with trial1
. Formatting must
be consistent with trial1
.
boolean that allows for a fixed training set and a split
test set. Example// train model on data from two breeding programs and a
stratified subset (70%) of a third and test on the remaining samples
(30%) of the third. If FALSE
, the entire provided test set
test.data
will remain as a testing set or if none is provided, 30%
of the provided train.data
will be used for testing. Default is
FALSE
.
Integer to be used internally as input for set.seed()
.
Only used if stratified.sampling = TRUE
. In all other cases, seed
is set to the current iteration number. Default is 1.
If TRUE
, the number of rows removed through filtering
will be printed to the console. Default is TRUE
.
DEPRECATED wavelengths
is no
longer supported; this information is now inferred from df
column names
DEPRECATED please use
pretreatment
to specify the specific pretreatment(s) to test.
For behavior identical to that of preprocessing = TRUE
, set
pretreatment = 1:13
`.
DEPRECATED output.summary = FALSE
is no longer supported; a summary of output is always returned alongside
the full performance statistics.
DEPRECATED
rf.variable.importance = FALSE
is no longer supported; variable
importance results are always returned if the model.method
is
set to `pls` or `rf`.
list
of 5 objects:
`model.list` is a list
of trained model objects, one for each
pretreatment method specified by the pretreatment
argument.
Each model is trained with all rows of df
.
`summary.model.performance` is a data.frame
containing summary
statistics across all model training iterations and pretreatments.
See below for a description of the summary statistics provided.
`model.performance` is a data.frame
containing performance
statistics for each iteration of model training separately (see below).
`predictions` is a data.frame
containing both reference and
predicted values for each test set entry in each iteration of
model training.
`importance` is a data.frame
containing variable importance
results for each wavelength at each iteration of model training.
If model.method
is not "pls" or "rf", this list item is NULL
.
`summary.model.performance` and `model.performance` data.frames
summary statistics include:
Tuned parameters depending on the model algorithm:
Best.n.comp, the best number of components
Best.ntree, the best number of trees in an RF model
Best.mtry, the best number of variables to include at every decision point in an RF model
RMSECV, the root mean squared error of cross-validation
R2cv, the coefficient of multiple determination of cross-validation for PLSR models
RMSEP, the root mean squared error of prediction
R2p, the squared Pearson’s correlation between predicted and observed test set values
RPD, the ratio of standard deviation of observed test set values to RMSEP
RPIQ, the ratio of performance to interquartile difference
CCC, the concordance correlation coefficient
Bias, the average difference between the predicted and observed values
SEP, the standard error of prediction
R2sp, the squared Spearman’s rank correlation between predicted and observed test set values
Calls pretreat_spectra
, format_cv
,
and train_spectra
functions.
# \donttest{
library(magrittr)
ikeogu.2017 %>%
dplyr::rename(reference = DMC.oven,
unique.id = sample.id) %>%
dplyr::select(unique.id, reference, dplyr::starts_with("X")) %>%
na.omit() %>%
test_spectra(
train.data = .,
tune.length = 3,
num.iterations = 3,
pretreatment = 1
)
#> Pretreatment initiated.
#> Training models...
#> Working on Raw_data
#> Returning model...
#> $model
#> Partial least squares regression, fitted with the kernel algorithm.
#> Call:
#> plsr(formula = reference ~ spectra, ncomp = tune.length, data = df.plsr)
#>
#> $summary.model.performance
#> SummaryType ModelType RMSEp R2p RPD RPIQ CCC
#> 1 mean pls 2.0440540 0.77824381 2.085018 2.6038009 0.85155095
#> 2 sd pls 0.2661027 0.02093217 0.183391 0.3062501 0.04696515
#> 3 mode pls 1.8971232 0.78270780 2.139263 2.7700624 0.86628031
#> Bias SEP RMSEcv R2cv R2sp best.ncomp best.ntree
#> 1 -0.05516781 2.0652364 1.95463427 0.7755627 0.77670789 3 NA
#> 2 0.09468830 0.2688604 0.08412512 0.0157923 0.01211986 0 NA
#> 3 -0.13588174 1.9167831 1.96423744 0.7807426 0.76866372 3 NA
#> best.mtry
#> 1 NA
#> 2 NA
#> 3 NA
#>
#> $model.performance
#> Iteration ModelType RMSEp R2p RPD RPIQ CCC
#> 1 1 pls 1.897123 0.7827078 2.139263 2.770062 0.8662803
#> 2 2 pls 1.883812 0.7965839 2.235167 2.790961 0.8893859
#> 3 3 pls 2.351227 0.7554397 1.880623 2.250380 0.7989866
#> Bias SEP RMSEcv R2cv R2sp best.ncomp best.ntree
#> 1 -0.13588174 1.916783 1.964237 0.7807426 0.7686637 3 NA
#> 2 0.04906262 1.903334 2.033546 0.7578310 0.7708123 3 NA
#> 3 -0.07868431 2.375593 1.866120 0.7881145 0.7906476 3 NA
#> best.mtry
#> 1 NA
#> 2 NA
#> 3 NA
#>
#> $predictions
#> Iteration ModelType unique.id reference predicted
#> 1 1 pls C16Mcal_3 42.0446205 39.7341295535482
#> 2 1 pls C16Mcal_11 35.2299995 37.0584291524637
#> 3 1 pls C16Mcal_14 42.2379684 41.2378709425078
#> 4 1 pls C16Mcal_16 36.3796349 38.4719648775467
#> 5 1 pls C16Mcal_17 36.6281929 38.3893405413672
#> 6 1 pls C16Mcal_21 37.6122742 37.5876338455539
#> 7 1 pls C16Mcal_23 37.1399994 36.1622873913981
#> 8 1 pls C16Mcal_24 42.1911201 40.0020104063215
#> 9 1 pls C16Mcal_28 29.2099991 31.808857469154
#> 10 1 pls C16Mcal_34 42.5299988 38.7208091511507
#> 11 1 pls C16Mcal_36 36.4031143 36.246812495919
#> 12 1 pls C16Mcal_37 36.7437744 37.0942122518994
#> 13 1 pls C16Mcal_38 33.7883987 35.8131640124789
#> 14 1 pls C16Mcal_40 36.1300011 36.9447531214695
#> 15 1 pls C16Mcal_46 32.3829803 33.3493674228586
#> 16 1 pls C16Mcal_50 37.7099991 34.0582110463825
#> 17 1 pls C16Mcal_52 39.11203 37.9982998877743
#> 18 1 pls C16Mcal_53 41.4599991 39.5928476620954
#> 19 1 pls C16Mcal_64 27.2600002 28.690815010551
#> 20 1 pls C16Mcal_68 33.1699982 34.0802252970327
#> 21 1 pls C16Mcal_70 39.5100403 37.6869323999689
#> 22 1 pls C16Mcal_77 33.8568802 29.7188404153879
#> 23 1 pls C16Mcal_78 39.3699989 37.6632226605092
#> 24 1 pls C16Mcal_79 35.9900017 38.7002568474304
#> 25 1 pls C16Mcal_82 34.8699989 30.4030934524671
#> 26 1 pls C16Mcal_83 34.1562042 33.3061746818153
#> 27 1 pls C16Mcal_85 33.1330147 32.4015416610514
#> 28 1 pls C16Mcal_89 40.6085129 39.9318954639371
#> 29 1 pls C16Mcal_92 34.4248657 34.7700364144324
#> 30 1 pls C16Mcal_107 38.8699989 37.1385397881858
#> 31 1 pls C16Mcal_109 28.9400005 31.3652388059175
#> 32 1 pls C16Mcal_111 37.6399994 36.8955816318098
#> 33 1 pls C16Mcal_113 33.4700012 35.5350097538434
#> 34 1 pls C16Mcal_116 43.2331314 41.7466397448865
#> 35 1 pls C16Mval_2 43.7411308 41.3697207058367
#> 36 1 pls C16Mval_5 41.7549706 42.0425597521298
#> 37 1 pls C16Mval_10 39.3955345 37.6258211459777
#> 38 1 pls C16Mval_13 38.6752853 37.2510250165379
#> 39 1 pls C16Mval_15 38.5099792 39.0789254447938
#> 40 1 pls C16Mval_17 38.3962402 38.2769583394128
#> 41 1 pls C16Mval_25 36.5172691 36.4620805976155
#> 42 1 pls C16Mval_28 36.3800011 36.8373837961633
#> 43 1 pls C16Mval_32 35.9099998 37.1844223289279
#> 44 1 pls C16Mval_39 34.5699997 37.112799121684
#> 45 1 pls C16Mval_40 34.2991219 34.3335500159472
#> 46 1 pls C16Mval_46 33.4192848 34.1640920673306
#> 47 1 pls C16Mval_47 31.1025772 34.4603465331052
#> 48 1 pls C16Mval_49 30.8113632 33.0971258868265
#> 49 1 pls C16Mval_53 27.3490391 28.008487137467
#> 50 2 pls C16Mcal_4 39.0099869 36.9721445534137
#> 51 2 pls C16Mcal_12 41.9791336 40.8186959650909
#> 52 2 pls C16Mcal_19 39.7091141 38.626203555885
#> 53 2 pls C16Mcal_22 41.2799988 40.5777161867026
#> 54 2 pls C16Mcal_23 37.1399994 36.6069430215017
#> 55 2 pls C16Mcal_24 42.1911201 39.7716650588323
#> 56 2 pls C16Mcal_25 31.7656345 34.1048344379296
#> 57 2 pls C16Mcal_29 39.6450653 40.5048095505648
#> 58 2 pls C16Mcal_33 34.9757233 34.0910898624338
#> 59 2 pls C16Mcal_41 39.9312401 39.2384874302655
#> 60 2 pls C16Mcal_42 34.7200012 34.2219028602118
#> 61 2 pls C16Mcal_45 29.9400005 31.665033656594
#> 62 2 pls C16Mcal_49 38.2799988 37.9975383380893
#> 63 2 pls C16Mcal_50 37.7099991 34.5142103098015
#> 64 2 pls C16Mcal_53 41.4599991 39.2719913629993
#> 65 2 pls C16Mcal_63 30.8999996 31.735227722246
#> 66 2 pls C16Mcal_67 36.2628212 37.821088342659
#> 67 2 pls C16Mcal_69 31.9607944 30.1513898880451
#> 68 2 pls C16Mcal_71 40.0252266 38.7768290358187
#> 69 2 pls C16Mcal_73 29.9218559 31.2831883942071
#> 70 2 pls C16Mcal_74 32.0927048 34.9271074692052
#> 71 2 pls C16Mcal_80 31.5499992 33.8737717856748
#> 72 2 pls C16Mcal_81 37.6691475 38.7673884545834
#> 73 2 pls C16Mcal_84 34.1899986 33.678669967893
#> 74 2 pls C16Mcal_87 35.0520935 38.026179600089
#> 75 2 pls C16Mcal_89 40.6085129 39.6314522574414
#> 76 2 pls C16Mcal_93 34.8271332 35.7453158525005
#> 77 2 pls C16Mcal_96 36.2366486 39.1518658417376
#> 78 2 pls C16Mcal_103 39.3523369 39.2231309364537
#> 79 2 pls C16Mcal_109 28.9400005 31.6287841708249
#> 80 2 pls C16Mcal_110 23.5921307 19.2719586527373
#> 81 2 pls C16Mcal_115 37.8899994 34.0844299539589
#> 82 2 pls C16Mcal_121 34.3344879 35.4271153607386
#> 83 2 pls C16Mval_8 39.8222618 37.7519771007149
#> 84 2 pls C16Mval_10 39.3955345 37.5400073024118
#> 85 2 pls C16Mval_11 38.898819 38.0622226951498
#> 86 2 pls C16Mval_16 38.4863548 38.2158385619491
#> 87 2 pls C16Mval_18 38.1199989 39.7560516216508
#> 88 2 pls C16Mval_20 37.9215317 37.6329377937141
#> 89 2 pls C16Mval_22 37.4799995 35.8860319129793
#> 90 2 pls C16Mval_23 37.3499985 37.8586343778337
#> 91 2 pls C16Mval_31 36.0321198 35.0370286129604
#> 92 2 pls C16Mval_32 35.9099998 37.6957989409211
#> 93 2 pls C16Mval_33 35.4645767 33.942650482815
#> 94 2 pls C16Mval_38 34.6531563 37.3115954645114
#> 95 2 pls C16Mval_44 33.7523422 35.3383323430282
#> 96 2 pls C16Mval_49 30.8113632 33.0505325050558
#> 97 2 pls C16Mval_51 28.3097172 32.020432547462
#> 98 2 pls C16Mval_53 27.3490391 28.0155555233523
#> 99 3 pls C16Mcal_9 38.1199989 37.6136268303616
#> 100 3 pls C16Mcal_10 31.7993336 33.9960798905344
#> 101 3 pls C16Mcal_12 41.9791336 40.7223193329217
#> 102 3 pls C16Mcal_14 42.2379684 40.0322858749734
#> 103 3 pls C16Mcal_24 42.1911201 38.7410403428496
#> 104 3 pls C16Mcal_32 37.4700012 37.8085759518241
#> 105 3 pls C16Mcal_34 42.5299988 38.7895688013236
#> 106 3 pls C16Mcal_41 39.9312401 39.2703570299875
#> 107 3 pls C16Mcal_44 43.2962227 38.9882580121138
#> 108 3 pls C16Mcal_45 29.9400005 33.0227030066139
#> 109 3 pls C16Mcal_54 35.0544205 34.7842135784515
#> 110 3 pls C16Mcal_56 38.4799995 37.650172422375
#> 111 3 pls C16Mcal_59 28.9799995 35.7692039040073
#> 112 3 pls C16Mcal_60 36.6251221 37.1083182285589
#> 113 3 pls C16Mcal_64 27.2600002 28.3550462617746
#> 114 3 pls C16Mcal_65 38.8069687 37.039754641379
#> 115 3 pls C16Mcal_72 40.1213608 36.9984744249914
#> 116 3 pls C16Mcal_73 29.9218559 31.5143515084898
#> 117 3 pls C16Mcal_76 35.254776 34.5368763296041
#> 118 3 pls C16Mcal_79 35.9900017 39.1099151476543
#> 119 3 pls C16Mcal_80 31.5499992 33.9930967456893
#> 120 3 pls C16Mcal_85 33.1330147 33.5299983423354
#> 121 3 pls C16Mcal_88 34.6599998 36.358740645514
#> 122 3 pls C16Mcal_92 34.4248657 35.4589905398041
#> 123 3 pls C16Mcal_93 34.8271332 35.7826209715778
#> 124 3 pls C16Mcal_97 34.2159386 35.2912693881478
#> 125 3 pls C16Mcal_100 34.8311043 36.1969705259602
#> 126 3 pls C16Mcal_101 41.6001511 39.3333420265112
#> 127 3 pls C16Mcal_104 35.1094627 34.3670903613702
#> 128 3 pls C16Mcal_105 41.1385422 37.4734946714685
#> 129 3 pls C16Mcal_108 39.1699982 38.2254508879645
#> 130 3 pls C16Mcal_109 28.9400005 32.3303651691149
#> 131 3 pls C16Mcal_111 37.6399994 36.6151814290969
#> 132 3 pls C16Mcal_112 40.9370956 38.8778069984782
#> 133 3 pls C16Mcal_115 37.8899994 36.007553828791
#> 134 3 pls C16Mcal_119 31.1599998 29.0166052416485
#> 135 3 pls C16Mval_4 43.1399994 40.3666684724336
#> 136 3 pls C16Mval_10 39.3955345 37.1111538375656
#> 137 3 pls C16Mval_12 38.6911316 38.9907619623887
#> 138 3 pls C16Mval_20 37.9215317 37.6017574596303
#> 139 3 pls C16Mval_22 37.4799995 35.7117317305532
#> 140 3 pls C16Mval_24 36.7746239 36.8383894043584
#> 141 3 pls C16Mval_28 36.3800011 36.744908213705
#> 142 3 pls C16Mval_29 36.2344589 36.2652962480481
#> 143 3 pls C16Mval_34 35.1879196 32.7954860929976
#> 144 3 pls C16Mval_42 33.8788452 35.6942086569854
#> 145 3 pls C16Mval_49 30.8113632 33.4093703124484
#> 146 3 pls C16Mval_52 28.0000305 34.2648127570954
#> 147 3 pls C16Mval_53 27.3490391 28.1015099612564
#>
#> $importance
#> # A tibble: 3 × 2,153
#> Iteration ModelType X350 X351 X352 X353 X354 X355 X356 X357
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 pls 0.0258 0.0245 0.0278 0.0268 0.0265 0.0311 0.0343 0.0312
#> 2 2 pls 0.0351 0.0344 0.0329 0.0315 0.0328 0.0365 0.0416 0.0404
#> 3 3 pls 0.0241 0.0251 0.0256 0.0258 0.0253 0.0292 0.0324 0.0310
#> # ℹ 2,143 more variables: X358 <dbl>, X359 <dbl>, X360 <dbl>, X361 <dbl>,
#> # X362 <dbl>, X363 <dbl>, X364 <dbl>, X365 <dbl>, X366 <dbl>, X367 <dbl>,
#> # X368 <dbl>, X369 <dbl>, X370 <dbl>, X371 <dbl>, X372 <dbl>, X373 <dbl>,
#> # X374 <dbl>, X375 <dbl>, X376 <dbl>, X377 <dbl>, X378 <dbl>, X379 <dbl>,
#> # X380 <dbl>, X381 <dbl>, X382 <dbl>, X383 <dbl>, X384 <dbl>, X385 <dbl>,
#> # X386 <dbl>, X387 <dbl>, X388 <dbl>, X389 <dbl>, X390 <dbl>, X391 <dbl>,
#> # X392 <dbl>, X393 <dbl>, X394 <dbl>, X395 <dbl>, X396 <dbl>, X397 <dbl>, …
#>
# }