Academia.eduAcademia.edu
HORTSCIENCE  https://doi.org/10.21273/HORTSCI16015-21 Discrimination of Salix caprea, Salix gracilistyla, and Their Interspecific Hybrid Using Vegetative Characteristics and Partial Least Squares Discriminant Analysis Han-Na Seo, Hyo-In Lim, Yong-Yul Kim, and Seung-Beom Chae Forest Bioinformation Division, National Institute of Forest Science, Suwon, Korea 16631 Wonwoo Cho Forest Tree Improvement Division, National Institute of Forest Science, Suwon, Korea 16631 Additional index words. accuracy, classification, DUS test, hybridization, variable influence on projection, VIP, variety Abstract. Identifying the morphological characteristics that distinguish plant varieties is an important issue for plant breeders and researchers. The objective of the present study was to create a partial least squares discrimination analysis (PLS-DA) model with morphological characteristics for species discrimination and to select the characteristics most important for species discrimination. Data for 27 vegetative characteristics were obtained from Salix caprea and Salix gracilistyla, and their interspecific hybrid (S. caprea 3 S. gracilistyla), and used for PLS-DA. According to this analysis, seven of the 27 characteristics were identified as those that most influenced species discrimination, and the PLS-DA model with these seven characteristics had a classification accuracy of 86% to 100%. The classification performance of this model was not significantly different from that of the model with all 27 characteristics (full model). Therefore, these results indicated that the three species can be relatively well distinguished by the seven characteristics extracted by PLS-DA. In addition, the selected characteristics can be used to select cross-breeding parents in subsequent breeding programs and to test the distinction, uniformity, and stability (DUS test) of the hybrid variety. From this perspective, PLS-DA is thought to be a useful methodology for classifying new plant varieties and providing information for breeding. According to the International Union for the Protection of New Varieties of Plants (UPOV), protection of new varieties can only be granted if the DUS test proves that their expression characteristics differ from that of any other variety (UPOV, 2002). Therefore, plant breeders and researchers are focused on finding morphological characteristics that can distinguish a new variety from other varieties and can explain the overall features of the variety well. This is mainly because these characteristics can be used to test the DUS of different breeds as well as to select crossbreeding parents in subsequent breeding programs and to preserve genetic resources (Korir et al., 2012). From a statistical point of view, the process of extracting characteristics that can distinguish a given variety from others belongs to a main topic dealt with in a Received for publication 24 May 2021. Accepted for publication 29 June 2021. Published online 3 September 2021. H.-I.L. is the corresponding author. E-mail: iistorm@korea.kr. This is an open access article distributed under the CC BY-NC-ND license (https://creativecommons. org/licenses/by-nc-nd/4.0/). discrimination analysis rather than in a cluster analysis (Kuhn and Johnson, 2013). Linear discriminant analysis (LDA) is the most commonly used method to find a linear combination of characteristics that can be used to discriminate two or more classes of varieties (Galdon et al., 2012). The resulting linear combination can be used as a classifier for the classification of varieties. In addition, because LDA is performed as a multiple linear regression model using characteristics as explanatory variables, it has the advantage of being able to compare the relative influence of each characteristic on the classification of the varieties (Bruce and Bruce, 2017). However, LDA has a drawback in that the accuracy of the model is decreased by the multicollinearity and dimensionality occurring when multiple correlated variables outnumber the number of observations used. As an alternative method, principal component analysis and linear discriminant analysis (PCALDA) has often been used; this analysis applies the LDA on principal components (latent variables) from the PCA rather than on the original variables (De Luca et al., 2012). On the other hand, in the field of chemometrics and metabolomics research, PLS-DA has been widely used for discrimination, HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 classification, and authenticity identification of a target object (Fonville et al., 2010; Hur et al., 2015; Kwon et al., 2014; Yan et al., 2014). Recently, a classification research of cultures using PLS-DA is also being conducted in the plant field (Kong et al., 2013; Shrestha et al., 2016). PLS-DA is effective in selecting remarkable characters for solving classification problems (Ruiz-Perez et al., 2020). In particular, PLS-DA has an advantage in that it is free of multicollinearity and dimensionality problems (Barker and Rayens, 2003). S. caprea and S. gracilistyla are deciduous broadleaf willow species native to Korea (Lee, 2003). S. caprea is a small tree growing in wetlands or lower parts of mountains, and it is known to be suitable for landscape restoration (Vaculık et al., 2012; Wu and Raven, 1999). S. gracilistyla is a shrub that grows in wetlands (or by the water) and mountain valleys, and it is known to invade the restored areas quickly after the restoration of wetlands (Cho et al., 2008; Choi and Kim, 2015) and to have flowering precocious characteristics (Wu and Raven, 1999). Recently, the National Institute of Forest Research has cross-bred S. caprea and S. gracilistyla to develop high biomass productivity varieties. A study using PCA to analyze 21 flower characteristics (12 for female flowers and nine for male) showed that S. caprea, S. gracilistyla, and their interspecific hybrid were distinguishable from each other (Seo et al., 2021). The characteristics of vegetative organs are also very important for testing discrimination, uniformity, and stability (the DUS test) according to the International Union for the Protection of New Varieties of Plants (UPOV) Convention. For example, in the guidelines for conducting DUS tests for willow (Salix L.) developed by the UPOV, 20 of 23 characteristics are those of vegetative organs, such as leaves and branches (UPOV, 2006). The guidelines for goat willow (S. caprea L.) developed by the Korea Forest Service also presented 14 characteristics of vegetative organs (Korea NFSV, 2019). Nevertheless, to date, no studies have been conducted to discriminate and classify S. caprea, S. gracilistyla, and their interspecific hybrid (S. caprea  S. gracilistyla) using vegetative characteristics. In the present study, a PLS-DA model was created to discriminate and classify the two willow species and their interspecific hybrid using 27 characteristics of vegetative organs. In addition, a set of characteristics that most influenced the discrimination and classification of S. caprea, S. gracilistyla, and their interspecific hybrid was extracted so that it can be used to select cross-breeding parents in subsequent breeding programs and to test the DUS of the hybrid variety. Materials and Methods Sample collection and measurement of vegetative characteristics. A total of 100 trees of S. caprea  S. gracilistyla (SH) were used in this study. They were sampled from a population of single full-sib progenies obtained 1 of 9 Table 1. Twenty-seven vegetative characteristics (19 quantitative and eight qualitative) of Salix caprea (SC), Salix gracilistyla (SG), and their interspecific hybrid (SH) along with measurement units or states of expression. Organ Leaf Stipule Branchlet Winter bud Characteristics Leaf length Leaf width Leaf length/width ratio Leaf width upper 1/3 Leaf width lower 1/3 Leaf base angle Leaf head angle Petiole length Petiole width Leaf thickness Lateral vein number Leaf lower hair length Number of leaf lower hairs per unit area Leaf margin type Lateral vein type Leaf lower hair type Stipule length Stipule width Stipule length/width ratio Stipule serration number Stipule margin type Branchlet hair type LBH SL SW SR SN SM BH Branchlet color Winter bud length Winter bud width Winter bud hair type Winter bud color BC BL BW WBH WBC in 2015 by a cross between one female tree of S. caprea (SC) and one male tree of S. gracilistyla (SG). The progenies were 5 years old and grew at an experimental site of the National Institute of Forest Science in Suwon City, Korea. Thirty-five trees of each species (SC and SG) were sampled from two natural populations at Gangneung City (for SC) and Chuncheon City (for SG) in Gangwon Province, Korea. When possible, mature trees were selected to minimize observation of immature characteristics. Twenty-seven characteristics of four vegetative organs (leaves, stipules, branchlets, and winter buds) in 170 trees (100 for SH and 35 for each SC and SG) were measured (Table 1) as described in Wu and Raven (1999), UPOV (2006), and Korea NFSV (2019). Nineteen of the 27 characteristics were quantitative, and eight were qualitative. Details of the names, abbreviations, and measurement units (expression states for qualitative characteristics) of the 27 characteristics are given in Table 1, and the relevant characteristics are shown in Fig. 1 (for 19 quantitative characteristics) and Fig. 2 (for eight qualitative characteristics). All measurements were completed between July and August 2020. Statistical analysis. The agricolae package in R (De Mendiburu and Simon, 2015) was used to calculate basic descriptive statistics for the 19 characteristics and to conduct analysis of variance (ANOVA) and Duncan's multiple range test. Before conducting the PLS-DA, a set of data for the 27 characteristics of the 170 trees of SC, SG, and SH (three classes) was divided 2 of 9 Abbreviation LL LW LR LWU LWL LB LH LPL LPW LT LVN HL HN LM LV Measurement units or states of expression cm cm ratio cm cm degree ( ) degree ( ) mm mm mm number mm number irregular or serrate joining together or not joining together curly or straight mm mm ratio number irregular or serrate glabrous, glabrous or tomentose, and tomentose yellow or red mm mm glabrous or tomentose red or yellow into two subsets: training (70%) and testing set (30%). This data partition was implemented based on a method of species-level stratified random sampling without replacement using the caret package in R (Kuhn, 2008). The training set comprised 120 observations (25 observations for each SC and SG, and 70 for SH), and the testing set comprised 50 observations (10 observations for each SC and SG, and 30 for SH). All PLS-DA processes were performed using the mdatools package in R (Kucheryavskiy, 2020). The following model equation was used for the PLS-DA as described in Brereton et al. (2018): Y 5 XB 1 E, where Y is a matrix of the response (the three classes), X is a matrix of centered and scaled predictor variables (27 characteristics), B is a matrix of regression coefficients of the predictor variables, and E is a matrix of error terms (residuals). An algorithm, which was a statistically inspired modification of the PLS method (SIMPLS) in the mdatools package in R was used to decompose the X and Y matrices and to compute scores, loadings, and residuals according to the following equations, as described in Kucheryavskiy (2021) and Peerbhay et al. (2013): X 5 TP 1 Ex and Y5 UQ 1 Ey, where T and U are the factor score matrices, P and Q are the loading matrices, and Ex and Ey are the residuals. Cross-validation was conducted on the training set using the leave-one-out cross-validation (LOOCV) method (Kucheryavskiy, 2021; Mabood et al., 2017). An optimal number of components (latent variables) was selected by comparing the root mean square error (RMSE), coefficient of determination (R2), and classification accuracy of each model generated by the LOOCV method. Using the selected optimal number of latent variables for the 27 characteristics, the first PLS-DA model (full model) was created and then fit to the training set. The overall performance of the first model was evaluated by reviewing statistics, such as the values of RMSE, R2, and accuracy. In particular, the scores of variables important for projection (VIP) of each characteristic were computed and then used to select the most influential characteristics that can simplify the PLS-DA model and improve performance (Chong and Jun, 2005; Peerbhay et al., 2013; PerezEnciso and Tenenhaus, 2003). Regression coefficients and their corresponding P values were used along with the VIP scores to select the predictor variables. The criterion for variable selection used in this study was that the VIP score is greater than 1.0, and the P value of the regression coefficients is less than 0.05, for at least two of the three classes. The second PLS-DA model (reduced model) was created using the selected optimal number of components and a set of most influential characteristics, and it was then fitted to the training set. The overall performance of the second model was evaluated as described for the first model. The second model was fitted to the testing set, and the predicted values for each observation included in the testing set were computed and used to create a confusion matrix. The confusion matrix was structured with four cases of classification: true positive (TP), false negative (FN), false positive (FP), and true negative (TN). TP is the number of cases in which the given class is correctly classified as in-class, TN is the number of cases when the other class is correctly classified as out-class, FN is the number of cases when the given class is incorrectly classified as out-class, and FP is the number of cases when the other class is incorrectly classified as in-class (Ballabio and Consonni, 2013; Sroute et al., 2020). The values of specificity, sensitivity, and accuracy were computed and used to evaluate the classification performance of the second model. Results Comparison of vegetative characteristics. Means, standard deviations, one-way ANOVAs, and Duncan’s multiple range tests of the 19 quantitative characteristics of the three species are shown in Table 2. There were significant mean differences among the three species in 17 of the 19 characteristics. As shown in Fig. 2, SC and SG differed in leaf size and shape; SC had large sized and ovateoblong shaped leaves, whereas SG had relatively small and narrow elliptic-oblong shaped leaves, and SH had intermediateformed leaves. These differences were reflected in the five characteristics related to leaf size (LL, LW, LWU, LWL, and LB); the mean values of these characteristics were higher in SC than in SG and SH, and the HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 Fig. 1. Quantitative morphological characteristics of leaves, stipules, and winter buds of the three studied species. (A) Salix caprea leaf, (B) interspecific hybrid leaf, (C) Salix gracilistyla leaf, (D) Salix caprea stipule, (E) interspecific hybrid stipule, (F) Salix gracilistyla stipule, (G) Salix caprea winter bud, (H) interspecific hybrid winter bud, and (I) Salix gracilistyla winter bud. Abbreviations of flower characteristics are listed in Table 1. differences were significant according to Duncan’s multiple range test (Table 2). In the other five characteristics (LPL, LT, LVN, HL, and SW), SC also had significantly higher mean values than those in SG and SH. In only four characteristics (LR, LPW, HN, and SR), SG had higher mean values than those in SC, whereas SH showed intermediate characteristics between SC and SG. On the other hand, in another three characteristics (SN, BL, and BW), SH had higher mean values than those in SC and SG according to Duncan’s multiple range test. In five qualitative characteristics (LV, LBH, BC, WBH, and WBC), all the SCs showed only the SC type, indicating 100% uniformity. In another three qualitative characteristics (SM, LM, and BH), SC showed 77%, 89%, and 97% uniformity, respectively (Fig. 3). SG also showed only the SG type in four qualitative characteristics (LV, LBH, SM, and WBH). Among three qualitative characteristics (LM, BH, and BC), SG had 97%, 83%, and 60% uniformity, respectively. In the WBC of SG, the frequency of the SG type was less than 14%. SH had either SC or SG types in seven qualitative characteristics, except for WBC. However, the proportions of the SC and SG types in the SH population varied by characteristics: in three qualitative characteristics (LV, LBH, and SM), the proportions of the SHs with the SC and SG type were similar; on the other hand, in another four qualitative characteristics (LM, BH, BC, and WBH), the proportion of the SHs with the SG type was higher than the proportion of the SHs with the SC type. Overall, there seemed to be many SHs more similar to SG than to SC in seven qualitative characteristics except for WBC. Partial least squares discrimination analysis. The values of the RMSE and accuracy for each PLS-DA model generated from the cross-validation (LOOCV) performed with HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 the maximum number of latent variables (components), which was seven, are given in Table 3 and Fig. 4. For all species (three classes), the decreasing rate of the RMSE values for each model gradually slowed down in more than four latent variables (0.2286 for SC, 0.3478 for SG, and 0.4067 for SH), and the discriminant accuracy of each model in more than four latent variables showed no significant difference (1.0 for SC, 0.992 for SG, and 1.0 for SH). In terms of model interpretation, stability, and classification performance, four seemed to be the optimal number of latent variables (Ballabio and Consonni, 2013). Thus, four latent variables were used for the subsequent PLS-DA in the present study. The first PLS-DA model with 27 predictor variables (characteristics) using four latent variables explained 85.6% of the total variance in the Y response variable (the three classes) (Table 4). The values of the coefficient of 3 of 9 Fig. 2. Qualitative morphological characteristics of leaves, stipules, branchlets, and winter buds of Salix caprea (SC), Salix gracilistyla (SG), and their interspecific hybrid (SH). Abbreviations of flower characteristics are listed in Table 1. determination (R2) and RMSE of this model varied by class, where the SC had higher R2 and lower RMSE values than those of SG and SH. This model also showed 100% classification accuracy for both SC and SH, but a relatively lower accuracy (99.2%) for SG. The VIP scores of each of the 27 characteristics for the three classes, which were obtained from the first PLS-DA, are shown in Fig. 5. These values varied according to class and characteristics. Given that the VIP value of 1.0 was a cutoff criterion for variable Table 2. Summary of quantitative vegetative characteristics of Salix caprea (SC), Salix gracilistyla (SG), and their interspecific hybrid (SH). SC SH Characteristicsz 11.05 ± 1.70x aw 9.34 ± 1.38 c LL***y LW*** 4.96 ± 0.67 a 3.20 ± 0.56 b LR*** 2.26 ± 0.38 c 2.95 ± 0.36 b LWU*** 3.85 ± 0.60 a 2.80 ± 0.49 b LWL*** 4.03 ± 0.67 a 2.93 ± 0.56 b LB*** 114.57 ± 28.28 a 86.74 ± 20.81 b 55.58 ± 31.85 50.37 ± 16.60 LH NS LPL*** 18.86 ± 4.02 a 13.25 ± 2.83 b LPW** 1.51 ± 0.38 b 1.50 ± 0.26 b LT*** 0.11 ± 0.03 a 0.09 ± 0.02 b LVN** 12.80 ± 1.98 a 11.83 ± 1.66 b HL*** 0.57 ± 0.12 a 0.45 ± 0.13 b HN*** 16.86 ± 8.95 b 45.51 ± 22.57 a 6.26 ± 1.68 6.56 ± 1.88 Stipule SL NS SW*** 4.20 ± 1.13 a 3.32 ± 1.06 b SN*** 10.06 ± 3.23 b 13.63 ± 3.80 a SR*** 1.51 ± 0.22 c 2.02 ± 0.31 b Winter bud BL*** 7.85 ± 1.99 c 16.29 ± 3.58 a BW*** 3.99 ± 0.82 b 4.42 ± 0.81 a z Abbreviations of flower characteristics are the same as those in Table 1. y Analysis of variance test (nonsignificant, NS; significance levels: *P < 0.05, **P 0.001). x Mean ± SD. w Duncan’s multiple range test (significant at P < 0.05). Organ Leaf 4 of 9 10.04 2.80 3.63 2.60 2.54 65.35 52.25 11.43 1.72 0.11 12.46 0.32 41.23 7.25 2.97 10.89 2.53 12.82 3.63 SG ± 1.37 b ± 0.50 c ± 0.39 a ± 0.46 b ± 0.49 c ± 16.72 c ± 12.82 ± 2.83 c ± 0.28 a ± 0.02 a ± 1.56 ab ± 0.09 c ± 16.23 a ± 1.70 ± 0.85 b ± 4.86 b ± 0.58 a ± 2.70 b ± 0.53 c < 0.01, ***P < selection, as suggested in many related studies (Chong and Jun, 2005; Rajalahti et al., 2009; Wold et al., 2001), a total of 14 characteristics in SC, nine in SG, and 10 in SH could be selected based on these criteria. Only six characteristics (LR, BL, LV, LBH, WBH, and BC) had VIP values higher than 1.0 in all three classes. Although it seemed reasonable to use only six characteristics to create a new reduced PLS-DA model according to the widely used method of VIP-based variable selection, it is possible that such an extremely reduced number of characteristics would decrease the discrimination performance of the new model (Rajalahti et al., 2009; Villa et al., 2019). Thus, in the present study, only the characteristics with VIP values higher than 1.0 and P values of the regression coefficient less than 0.05 in at least two classes were selected and used to create the second model (i.e., the reduced model). Based on the variable selection using both VIP values and P values, seven characteristics (LR, SN, BL, LV, LBH, WBH, and BC) were finally selected; the first three were quantitative, and the remaining four were qualitative. Compared with the first PLS-DA model, the second PLS-DA model with seven characteristics (LL, LR, HL, BL, LV, LBH, and BC) using four latent variables showed lower values in all statistics, including the total HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 Table 3. Results of the leave-one-out cross-validation (LOOCV) for the partial least squares discrimination analysis (PLS-DA) model with all 27 predictor variables on the training dataset by species [Salix caprea (SC), Salix gracilistyla (SG), and their interspecific hybrid (SH)]. R2, the root mean square error (RMSE), and accuracy of each component are shown. SC SG SH Fig. 3. Results of qualitative vegetative characteristics frequency investigation of (A) Salix caprea (SC), (B) Salix gracilistyla (SG), and (C) their interspecific hybrid (SH). Qualitative vegetative characteristics were investigated based on the date shown in Fig. 2. Abbreviations of flower characteristics are listed in Table 1. Blue color indicates the SC type, orange indicates the SG type, and green indicates mixed A and B type. variability explained by the model (77.7%), R2 (90.0% for SC, 67.5% for SG, and 76.0% for SH), RMSE (0.2608 for SC, 0.4634 for SG, and 0.4840 for SH), and classification accuracy (100% for SC, 97% for SG, and 95% for SH) (Table 4). The decrease in all statistics in the second model seemed to be an inevitable consequence of using a reduced number of variables. However, the second model was selected and used for the subsequent classification of the three classes, mainly because this model showed a discrimination accuracy sufficient to be used for the classification, assuming that the error rate of classification is less than 5%. The regression coefficients, by class, for the seven characteristics included in the second PLS-DA model are shown in Fig. 6. Because it represented the relative magnitude and direction of the effect of each characteristic in species discrimination, the regression coefficient plot indicated the following. First, the direction of effects of four characteristics (LR, LV, LBH, and SN) of SG on species classification was opposite to that of SC and SH. Thus, a species with characteristics such as longer leaf length compared with width (LR), lateral vein type joining together before reaching margin (LV), straight hair type of leaf lower part (LBH), and less number of HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 Component Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6 Comp 7 Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6 Comp 7 Comp 1 Comp 2 Comp 3 Comp 4 Comp 5 Comp 6 Comp 7 R2 0.8038 0.8617 0.9147 0.9208 0.9359 0.9359 0.9393 0.2279 0.7397 0.7485 0.8167 0.8179 0.8282 0.8283 0.1192 0.7394 0.7520 0.8299 0.8352 0.8416 0.8434 RMSE 0.3598 0.3020 0.2372 0.2286 0.2057 0.2056 0.2001 0.7137 0.4144 0.4074 0.3478 0.3466 0.3366 0.3366 0.9254 0.5033 0.4910 0.4067 0.4003 0.3924 0.3901 Accuracy 0.992 1 1 1 1 1 1 0.792 0.983 0.992 0.992 0.992 1 1 0.758 0.967 0.983 1 0.992 1 1 stipule serration (SN), was more likely to be classified as SG by the second PLS-DA model, but the reverse was likely for SH and SC. Second, the direction of effects of two characteristics (BC and BL) of SH was opposite to that of SC and SG. The species with red-colored branchlet and long-length winter bud was classified as SH, but the reverse was likely for SC and SG. Third, the direction of WBH type of SC was opposite to that of both SH and SG. The species with glabrous hair type of winter bud was more likely to be classified as SC, but the reverse was likely for SH. The classification performance of the second model for the testing set is shown in Table 5. The second model showed a mean accuracy of 94% in the classification (86% for SH, 96% for SG, and 100% for SC), a mean sensitivity of 86% (80% for SG, 83.3% for SH, 100% for SC), and a mean specificity of 96.7% (90% for SH, 100% for SC and SG). The classification performance of the first model for the testing set is indicated in Table 5. Compared with the first model, the second model showed lower classification performance in terms of accuracy, sensitivity, and specificity. However, the classification performance was not very different between the two models, and in the consistent observation was that the misclassification of the two models was observed in both SG and SH. Therefore, considering these two facts, it seemed that the second PLS-DA model with seven characteristics could be used to discriminate and classify the three classes. Discussion The second PLS-DA model with seven characteristics (BL, SN, LR, LV, LBH, BC, 5 of 9 Fig. 4. The root mean square error (RMSE) value of the leave-one-out cross-validation (LOOCV) for the partial least squares discrimination analysis (PLSDA) model with all 27 predictor variables. (A) Salix caprea (SC), (B) Salix gracilistyla (SG), and (C) their interspecific hybrid (SH). and WBH) could discriminate SC, SG, and SH with an 86% to 100% accuracy (100% for SC, 96% for SG, and 86% for SH). This accuracy was lower than that obtained with the first PLS-DA model with 27 characteristics (100% for SC, 98% for SG, and 92% for SH). For more accurate classification of the three species, it was better to use all 27 characteristics included in the first PLD-DA model rather than seven characteristics in the second model. However, measuring all 27 characteristics is expensive; hence, the second PLS-DA model with seven characteristics appears to be more desirable and practical in terms of cost-effectiveness. In addition, the second model showed lower discriminant accuracy for SG and SH than for SC (Table 5). It misclassified two SGs into SH and could not classify five SHs. The misclassification and nonclassification of the second model were caused due to similarity between SG and SH in the seven characteristics included in the model (Table 2, Fig. 3). This similarity could be due to the unintentional use of SC similar to SG in the seven characteristics. It is very difficult to obtain progenies that are distinct from their parents through just one breeding, as most characteristics of tree species are polygenic traits (Sewell and Neale, 2000; Weih et al., 2006). Furthermore, a specific genotype combination of the multiple genes related to the best performance of the given characteristics can be obtained only through repeated multiple-generation breeding between the highest-grade progenies. Thus, subsequent hybridization experiments are also needed to create SHs that are more distinct from SC and SG. Two characteristics (BC and BL) that significantly influenced the discrimination of SH from SC and SG can be used as criteria for selecting SH individuals as mating parents in the hybridization. Particularly, it would be desirable to hybridize the SH parents with BC and BL of higher grades, for the development of a more distinct SH variety. If one of the SHs more distinct from SC and SG was applied to be registered for the Table 4. Results of the partial least squares discrimination analysis (PLS-DA) model with four-component latent variables of Xs predictors (27 variables) (first model) and the PLS-DA model with four-component latent variables of Xs predictors (seven variables) (second model) on the training dataset [120 observations: 25 observations for Salix caprea (SC), 25 for Salix gracilistyla (SG), and 70 for their interspecific hybrid (SH)] shown by three classes (SC, SG, and SH). RMSEz X cumulative Y cumulative Specificity Sensitivity Accuracy Class R2 First model SC 0.9208 0.2286 55.12 85.58 1.00 1.00 1.00 SG 0.8167 0.3478 55.12 85.58 1.00 0.96 0.99 SH 0.8299 0.4067 55.12 85.58 1.00 1.00 1.00 Second SC 0.8969 0.2608 95.27 77.68 1.00 1.00 1.00 model SG 0.6745 0.4634 95.27 77.68 0.98 0.92 0.97 SH 0.7590 0.4840 95.27 77.68 0.92 0.97 0.95 z RMSE 5 root mean square error. 6 of 9 protection of new SH varieties, this SH would have to be tested for the DUS of its characteristics using the DUS test guidelines of the related available species according to act on the protection of new plant varieties (Korea Ministry of Agriculture Food and Rural Affairs, 2017). The DUS test guidelines for SH have not been prepared yet, so the guideline for SC, which was established by the Korea Forest Service in 2020, will inevitably have to be used as an alternative (Korea NFSV, 2019). However, the DUS test guidelines on SC do not include six of the seven characteristics that have significantly contributed to the discrimination among SC, SG, and SH (the six characteristics being LBH, LV, SN, BC, BL, and WBH). Consequently, the guidelines for SC need to be reestablished to include these six characteristics. In conclusion, the results of the present study on the discrimination of SC, SG, and SH using 27 vegetative characteristics and PLS-DA methods clearly indicated the following two advantages of PLS-DA. First, PLS-DA can create a model with a linear combination of multiple intercorrelated characteristics relatively freely of multicollinearity and dimensionality, which are the main problems of LDA (Barker and Rayens, 2003). Second, PLS-DA had the advantages of facilitating the selection of characteristics that greatly influenced the discrimination of SC, SG, and SH, as well as comparing the relative importance and direction of influence of the selected characteristics using regression coefficients of these characteristics (Ballabio and Consonni, 2013). Therefore, it is expected that PLS-DA methods will greatly contribute to related studies investigating identification, discrimination, classification, HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 Fig. 5. The variable influence on projection (VIP) values by predictor obtained from the partial least squares discrimination analysis (PLS-DA) model with four-component latent variables of Xs predictors (27 variables) on the training dataset by species. (A) Salix caprea (SC), (B) Salix gracilistyla (SG), and (C) their interspecific hybrid (SH). Abbreviations of flower characteristics are the same as those listed in Table 1. Table 5. Confusion matrix for the results of the partial least squares discrimination analysis (PLS-DA) model with four-component latent variables of Xs predictors (27 variables) (first model) and the PLS-DA model with four-component latent variables of Xs predictors (seven variables) (second model) on the test dataset [50 observations: 10 observations for Salix caprea (SC), 10 for Salix gracilistyla (SG), and 30 for their interspecific hybrid (SH)] shown by three classes (SC, SG, and SH). Bold numbers indicate misclassification, and italic numbers indicate nonclassification. First model Second model Class SC SG SH SC SG SH TP (true positives) 10 9 27 10 8 25 FP (false positives) 0 0 1 0 0 2 HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 TN (true negatives) 40 40 19 40 40 18 FN (false negatives) 0 1 3 0 2 5 Specificity 1.00 1.00 0.95 1.00 1.00 0.90 Sensitivity 1.00 0.90 0.90 1.00 0.80 0.83 Accuracy 1.00 0.98 0.92 1.00 0.96 0.86 7 of 9 Fig. 6. Regression coefficients plot obtained from the second partial least squares discrimination analysis (PLS-DA) model with seven predictor variables. Blue: Salix caprea (SC); Yellow: Salix gracilistyla (SG); Green: their interspecific hybrid (SH). and breeding, if used along with cluster analysis and PCA. Literature Cited Ballabio, D. and V. Consonni. 2013. Classification tools in chemistry. Part 1: Linear models PLSDA. Anal. Methods 5:3790–3798, doi: 10.1039/ C3AY40582F. Barker, M. and W. Rayens. 2003. Partial least squares for discrimination. J. Chem. 17:166– 173, doi: 10.1002/cem.785. Brereton, R.G., J. Jansen, J. Lopes, F. Marini, A. Pomerantsev, O. Rodionova, J.M. Roger, B. Walczak, and R. Tauler. 2018. Chemometrics in analytical chemistry-part II: Modeling, validation, and applications. Anal. Bioanal. Chem. 410: 6691–6704, doi: 10.1007/s00216-018-1283-4. Bruce, P. and A. Bruce. 2017. Practical statistical for data scientists. O’Reilly Media, Inc., Sebastopol, CA. Cho, H.J., H. Woo, J. Lee, and K.H. Cho. 2008. Changes in riparian vegetation after restoration in a urban stream, Yangjae stream (in Korean with English abstract). J. Wet. Res. 10(3):111–124. Choi, H. and J.G. Kim. 2015. Study on characteristics of seed germination and seedling growth in Salix gracilistyla for invasive species management (in Korean with English abstract). J. Korea. Soc. Environ. Restor. Technol. 18(3): 79–95, doi: 10.13087/kosert.2015.18.3.79. Chong, I.G. and C.H. Jun. 2005. Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 78: 103–112, doi: 10.1016/j.chemolab.2004.12.011. De Luca, M., W. Terouzi, F. Kzaiber, G. Ioele, A. Oussama, and G. Ragno. 2012. Classification of Moroccan olive cultivars by linear discriminant analysis applied to ATR-FTIR spectra of endocarps. Int. J. Food Sci. Technol. 47: 1286–1292, doi: 10.1111/j.1365-2621.2012. 02972.x. De Mendiburu, F. and R. Simon. 2015. Agricolae Ten years of an open source statistical tool for experiments in breeding, agriculture and biology. PeerJ PrePrints 3:e1404v1. <https://doi. org/10.7287/peerj.preprints.1404v1>. 8 of 9 Fonville, J.M., S.E. Richards, R.H. Barton, C.L. Boulange, T.M.D. Ebbbels, J.K. Nicholson, E. Holmes, and M.-E. Dumas. 2010. The evolution of partial least square models and related chemometric approaches in metabonomics and metabolite phenotyping. J. Chemometr. 24:636–649, doi: 10.1002/cem.1359. Galdon, B.R., L.H. Rodrıguez, D.R. Mesa, H.L. Leon, N.L. Perez, E.M.R. Rodrıguez, and C.D. Romero. 2012. Differentiation of potato cultivars experimentally cultivated based on their chemical composition and by applying linear discriminant analysis. Food Chem. 133:1241–1248, doi: 10.1016/j.foodchem.2011.10.016. Hur, S.H., S.W. Kim, and B.W. Min. 2015. Discrimination of cultivars and cultivation origins from the sepals of dry persimmon using FT-IR spectroscopy combined with multivariate analysis (in Korean with English abstract). Korean J. Food Sci. Technol. 47:20–26, doi: 10.9721/ KJFST.2015.47.1.20. Kong, W., C. Zhang, F. Liu, P. Nie, and Y. He. 2013. Rice seed cultivar identification using nearinfrared hyperspectral imaging and multivariate data analysis. Sensors (Basel) 13:8916–8927, doi: 10.3390/s130708916. Korea Ministry of Agriculture Food and Rural Affairs. 2017. Act on the protection of new varieties of plants (Act No. 15075, 28 Nov. 2017). <https://www.law.go.kr/LSW/eng/engLsSc.do? menuId=2&section=bdyText&query=15075&x= 0&y=0#liBgcolor0>. Korea National Forest Seed and Variety Center (NFSV). 2019. Guidelines for measuring characteristics by crop for examination of new variety: Salix caprea L. Chungju, South Korea (in Korean). Korir, N.K., J. Han, L. Shangguan, C. Wang, E. Kayesh, Y. Zhang, and J. Fang. 2012. Plant variety and cultivar identification: Advances and prospects. Crit. Rev. Biotechnol. 15:111–125, doi: 10.3109/07388551.2012.675314. Kucheryavskiy, S. 2021. Getting started with mdatools for R. 29 Mar. 2021. <https://mdatools. com/docs/index.html>. Kucheryavskiy, S. 2020. mdatools - R package for chemometrics. Chemom. Intell. Lab. Syst. 198: 103937, doi: 10.1016/j.chemolab.2020.103937. Kuhn, M. 2008. Building predictive models in R using the caret package. J. Stat. Softw. 28(5):1–26, doi: 10.18637/jss.v028.i05. Kuhn, M. and K. Johnson. 2013. Applied predictive modeling. Springer, New York, NY, doi: 10.1007/978-1-4614-6849-3. Kwon, Y.K., M.S. Ahn, J.S. Park, J.R. Liu, D.S. In, B.W. Min, and S.W. Kim. 2014. Discrimination of cultivation ages and cultivars of ginseng leaves using Fourier transform infrared spectroscopy combined with multivariate analysis. J. Ginseng Res. 38(1):52–58, doi: 10.1016/j.jgr. 2013.11.006. Lee, T.B. 2003. Coloured flora of Korea. Hayangmunsa, Seoul, Korea. Vol. 2. (in Korean). Mabood, F., F. Jabeen, J. Hussain, A. Al-Harrasi, A. Hamaed, S.A.A. Al Mashaykhi, Z.M.A. Al Rubaiey, S. Manzoor, A. Khan, Q.M.I. Haq, S.A. Gilani, and A. Khan. 2017. FT-NIRS coupled with chemometric methods as a rapid alternative tool for the detection & quantification of cow milk adulteration in camel milk samples. Vib. Spectrosc. 92:245–250, doi: 10.1016/j.vibspec.2017.07.004. Peerbhay, K.Y., O. Mutanga, and R. Ismail. 2013. Commercial tree species discrimination using airborne AISA Eagle hyperspectral imagery and partial least squares discriminant analysis (PLSDA) in KwaZulu-Natal, South Africa. ISPRS J. Photogramm. Remote Sens. 79:19–28, doi: 10.1016/j.isprsjprs.2013.01.013. Perez-Enciso, M. and M. Tenenhaus. 2003. Prediction of clinical outcome with microarray data: A partial least squares discriminant analysis (PLS-DA) approach. Hum. Genet. 112:581– 592, doi: 10.1007/s00439-003-0921-9. Rajalahti, T., R. Arneberg, A. Kroksveen, M. Berie, K.M. Myhr, and M. Kvalheim. 2009. Discriminating variable test and selectivity ratio plot: Quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. Anal. Chem. 81:2581–2590, doi: 10.1021/ac802514y. Ruiz-Perez, D., H. Guan, P. Madhivanan, K. Mathee, and G. Narasimhan. 2020. So you think you can PLS-DA? BMC Bioinformatics 21:1–10, doi: 10.1186/s12859-019-3310-7. Seo, H.N., S.B. Chae, H.I. Lim, W. Cho, and W.Y. Lee. 2021. The flower morphological HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 characteristics of Salix capreaSalix gracilistyla. J. For. Environ. Sci. 37:35–43, doi: 10.7747/JFES.2021.37.1.35. Sewell, M.M. and D.B. Neale. 2000. Mapping quantitative traits in forest trees, p. 407–423. In: S.M. Jain and S.C. Minocha (eds.). Molecular biology of woody plants. Forestry Sciences, Vol. 64. Springer, Dordrecht, doi: 10.1007/ 978-94-017-2311-4_17. Shrestha, S., L.C. Deleuran, and R. Gislum. 2016. Classification of different tomato seed cultivars by multispectral visible-near infrared spectroscopy and chemometrics. J. Spectral Imaging 5:1–9, doi: 10.1255/jsi.2016.a1. Sroute, L., B.D. Byrd, and S.W. Huffman. 2020. Classification of mosquitoes with infrared spectroscopy and partial least squares-discriminant analysis. Appl. Spectrosc. 74:900–912, doi: 10.1177/0003702820915729. UPOV. 2002. General introduction to the examination of distinctness, uniformity and stability and the development of harmonized descriptions of new varieties of plants. TG/1/3. Union Internationale pour la Protection des Obtentions Vegetales, Geneva, Switzerland. <https://www. upov.int/en/publications/tg-rom/tg001/tg_1_3. pdf>. UPOV. 2006. International Union for the protection of new varieties of plants. WILLOW. UPOV Code: SALIX. Guidelines for the conduct of tests for distinctness, uniformity and stability. TG/72/6. Union Internationale pour la Protection des Obtentions Vegetales, Geneva, Switzerland. <https:// www.upov.int/edocs/tgdocs/en/tg072.pdf>. Vaculık, M., C. Konlechner, I. Langer, W. Adlassnig, M. Puschenreiter, A. Lux, and M.T. Hauser. 2012. Root anatomy and element distribution vary between two Salix caprea isolates with different Cd accumulation capacities. Environ. Pollut. 163:117–126, doi: 10.1016/ j.envpol.2011.12.031. Villa, J.E.L., N.R. Qui~nones, F. FantinattiGarboggini, and R.J. Poppi. 2019. Fast discrimination of bacteria using a filter paper-based SERS platform and PLS-DA with uncertainty estimation. Anal. Bioanal. Chem. 411:705–713, doi: 10.1007/s00216-018-1485-9. HORTSCIENCE · https://doi.org/10.21273/HORTSCI16015-21 Weih, M., A.C. R€onnberg-W€astljung, and C. Glynn. 2006. Genetic basis of phenotypic correlations among growth traits in hybrid willow (Salix dasycladosS. viminalis) grown under two water regimes. New Phytol. 170:467–477, doi: 10.1111/j.1469-8137.2006.01685.x. Wold, S., M. Sj€ostr€om, and L. Eriksson. 2001. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58(2):109–130, doi: 10.1016/S0169-7439(01)00155-1. Wu, Z.Y. and P.H. Raven. 1999. Flora of China. Vol. 4 (Cycadaceae through Fagaceae). Science Press, Beijing, and Missouri Botanical Garden Press, St. Louis, doi: 10.1111/j.1756-1051.1999. tb01142.x. Yan, S.M., J.P. Liu, L. Xu, X.S. Fu, H.F. Cui, Z.Y. Yun, X.P. Yu, and Z.H. Ye. 2014. Rapid discrimination of the geographical origins of an oolong tea (anxi-tieguanyin) by near-infrared spectroscopy and partial least squares discriminant analysis. J. Anal. Methods Chem. 1:704971, doi: 10.1155/ 2014/704971. 9 of 9