Molecular phylogeny and chromosomal evolution of endemic species of Sri Lankan Anacardiaceae

M Ariyarathne , D Yakandawala , M Barfuss , J Heckenhauer 4,5 and R Samuel 3 1 Department of Botany, Faculty of Science, University of Peradeniya, Peradeniya. 2 Postgraduate Institute of Science, University of Peradeniya, Peradeniya. 3 Department of Botany and Biodiversity Research, University of Vienna, Austria. 4 LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG), Frankfurt, Germany. 5 Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany.


INTRODUCTION
Family Anacardiaceae R. Br., the cashew family, contains 70 genera harbouring 985 species of trees, shrubs and subshrubs. The members are well known for causing contact dermatitis reactions. They occupy a considerable fraction of the tropical fl ora dispersed in tropical, subtropical and temperate regions holding Malaysian region as the center of diversity (Li, 2007;Pell et al., 2010;Zotz, 2013). As a tropical country, Sri Lanka nourishes 19 wild species including 15 endemics and numerous cultivated and hybrid species (MOE, 2012). Some members of the family are cultivated throughout the world for their edible fruits and seeds such as cashew (Anacardium occidentale L.), mango (Mangifera indica L. and numerous varieties), and hog plums (Spondias L.). In addition, some species have great medicinal properties: to treat fever (Buchanania Spreng., Comocladia P.Br.), for hepatitis (Haematostaphis Hook. f.), and for gastrointestinal illness (Anacardium L., Antrocaryon Pierre). Some species are known for quality wood and rot resistant timber such as quebracho (Schinopsis Engl.), as well as for landscaping (Cotinus Mill. and Toxicodendron Mill.) (Pell et al., 2010).
Earlier investigations suggest that the origin of this family is 55 to 65 MYA in the Paleocene era (Hsu, 1983; Journal of the National Science Foundation of Sri Lanka 48(3) Muller, 1984). According to Gentry (1982) Anacardiaceae has a Gondwanan origin which is supported by fossil records as well as the current worldwide distribution of the family.
Anacardiaceae belongs to phylum Tracheophyta, class Magnoliopsida, order Sapindales. The family was fi rst proposed by Lindley in 1830. Diff erent classifi cation systems were proposed for the family through the years (Table 1). From newly emerging research fi ndings in recent years, Kim et al., (2017) were able to sequence the complete chloroplast genome of Rhus chinensis Mill, and Jo et al. (2017) were able to sequence the complete plastome of Mangifera indica. These fi ndings will aid to increase the robustness of the phylogenetic interpretation of the family Anacardiaceae in the future. However, despite a number of taxonomic treatments, phylogenetic positions of many understudied genera are yet to be revealed.
Chromosome numbers have been used for taxonomical treatments (Cox et al., 1998;Bateman et al., 2003;Schneeweis et al., 2004;Almeida et al., 2007;Koch et al., 2012). According to Vinicius da Luz et al., (2015), only 14 % species belonging to the family Anacardiaceae have been investigated cytologically. Raven (1975) justifi ed and concluded that the ancestral basal chromosome number of the family is x = 7, suggesting that the evolution of the family is at tetraploid level. Of the two subfamilies of Anacardiaceae, subfamily Anacardioideae has been cytologically studied from early ages. Maheshwari (1934) initiated the chromosomal studies on the genus Mangifera giving an uncertain chromosome number for M. indica L. as 2n = 52 -58 and later, Darlington and Ammal (1955) stated the number as 2n = 40. These studies were expanded by Mukherjee (1950;1957) and Pierozzi and Rossetto (2006) by investigating chromosome counts for diff erent species of the genus Mangifera and varieties of M. indica, and confi rmed the somatic number as 2n = 40. Index to Plant Chromosome Numbers (IPCN) gives seven records of gametophytic count of Semecarpus anacardium L.f. as n = 29 and n = 30 (Mehra, 1976;Gill et al., 1981;Bir et al., 1982;. The Flora of Malesiana has stated that the chromosome count of Semecarpus L.f. as 2n = 60 (Hou, 1978) while Pell et al. (2010) confi rmed that the gametophytic count of genus Semecarpus is n = 30. Chromosome number of the genus Pistacia varies as 2n = 24, 28 and 30 (Huang et al., 1989;Parfi tt & Badenes, 1997). Among the species of subfamily Spondiadeae, 2n = 32 is the common diploid number found in Spondias spp. and Dracontomelon dao (Blanco) Merr. & Rolfe (respectively in Almeida et al., 2007;Oginuma et al., 1999). Sclerocarya caff ra Sond. exhibits a sporophytic count of 2n = 26 (Paiva & Leitao, 1989), Lannea coromandelica (Houtt.) Merr. and Poupartia axillaris (Roxb.) King & Prain have gametophytic counts of n = 15 ) and n = 12 (Mehra, 1976), respectively. Almeida et al. (2007) revealed the presence of large blocks of heterochromatin  CMA + in the species of the genus Spondias. The number and location of CMA bands were found to vary among the Spondias species and the distribution patterns of these heterochromatin blocks can be used to identify each Spondias species separately (Almeida et al., 2007).
Chromosomal evolution of angiosperms has been widely discussed among the taxonomic community for decades (Cox et al., 1998;Schneeweiss et al., 2004;Hansen et al., 2006;Mayrose et al., 2009;Duan et al., 2015) while cytological evolution of many plant families still remains unknown. This study is an attempt to bridge this gap in chromosomal evolution in Angiosperms by contributing molecular and cytological data on regionally restricted species of the family Anacardiaceae. The present investigation mainly focused on endemic and native species of the family Anacardiaceae in Sri Lanka.
Here we address (1) clarifi cation of the phylogenetic position of the endemic and native taxa using nuclear ITS and plastid matK regions; (2) chromosomal counts for endemic species; and (3) analysis and investigation of the evolution of chromosome number across the combined phylogeny.

Materials for molecular phylogeny and chromosome counts
Sampling included ten species of four genera representing all the genera with endemics in the country, which also corresponds to three tribes out of four found in Sri Lanka. Collected locations of each species are given in Ariyarathne et al. (2017). All the species investigated were included in separate and in combined phylogenetic analyses. Cytological studies were conducted only in fi ve species due to unavailability of viable seeds to obtain actively growing young roots. Vegetative propagation using diff erent media was attempted as an alternative but unfortunately none of the trails gave positive results due to high level of secondary metabolites in secretions. Semecarpus seeds were stored in air tight bags for 1-3 wk until germination initiated and later transferred to air tight units with moistened coir dust medium and then placed at ± 20 ˚C temperature, having 12 h light and 12 h dark cycles. Seeds of Mangifera zeylanica were potted in a sand medium and transferred to a soil medium after the germination. Actively growing rootlets, grown up to 1 cm, were used for chromosome counts.

DNA extraction and PCR amplifi cation
Genomic DNA was extracted from c. 45 mg silica gel dried leaf materials (Chase & Hills, 1991) using Qiagen DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's protocol. As an initial step to remove mucilaginous polysaccharides, the ground samples were washed 2-10 times with sorbitol buff er until no visible mucilaginous substances appeared in the sample solution (Russell et al., 2010;Souza et al., 2012).
To infer phylogenetic relationships maximum parsimony (MP), maximum likelihood (ML), and Bayesian inference (BI) were carried out. Trees were rooted with two outgroup taxa of the family Burseraceae; Bursera simaruba (L.) Sarg. and Canarium album Blanco. MP analysis was performed using PAUP* software ver. 4.0 (Swoff ord, 2003). For each dataset (ITS, matK and combined) heuristic search strategy was used with 1,000 replicates of random sequence addition, tree-bisection-reconnection (TBR) branch swapping and retaining multiple trees (MulTrees) by saving only 10 trees per replicate. Bootstrap method was used with full heuristic search mode followed by TBR branch swapping and random sequences addition with 1,000 replicates to estimate the support for each clade by holding only the groups with frequency greater than 50 %. The consistency index (CI) and retention index (RI) for tree topologies were calculated with PAUP. ML and BI analyses were performed for the combined matrix. The best fi tting substitution model was determined with jModelTest ver. 2.1.7 (Guindon & Gascuel, 2003;Darriba et al., 2012) using the Akaike information criterion (AICc). Evolutionary substitution models for each marker were calculated. The generalised time reversible (GTRGAMMA) model and gamma-distributed rate variation across sites and a proportion of invariable sites were used for the analysis. The ML rapid bootstrap analysis was performed with 1,000 replicates with search for best-scoring ML tree in one run. This analysis was conducted in RAxML ver. 8.2.4 (Stamatakis, 2014). Bayesian inference was conducted to obtain posterior probabilities using MrBayes ver. 3.2.6 (Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003). The same nucleotide substitution model (GTR+G) as in ML analysis was used with 10,000,000 generations Markov chain Monte Carlo (MCMC) chains with a sampling frequency of every 1,000 generations. The initial 25 % samples from each run were discarded as burning. A majority rule consensus tree was calculated using the remaining trees to obtain the posterior probabilities for each node. The resulting trees were visualised and edited in Fig Tree ver. 1.4.3 (http://tree.bio.ed.ac.uk/software/ fi gtree/, last accessed 2017-12-04).

Chromosome counts and preparation of karyotypes
Chromosome analysis protocol of Weiss- Schneeweiss et al. (2009) was optimised for the members of family Anacardiaceae to arrest the mitotic spindles. Collected actively growing root tips were pre-treated with 0.002 M 8-hydroxquinoline for 2 h at room temperature and 2.5 h at 4 >C. Then the root tips were fi xed for 2 h in freshly prepared Carnoy (3:1, ethanol: acetic acid) solution and stored at -20 >C until use. Standard Feulgen staining method was followed by washing the stored rootlets with distilled water and hydrolysing with 5N HCl for 30 min.  While preparing the karyotypes, more than three digital images were taken in diff erent focus planes to capture the best 2D appearance of each chromosome. Karyotypes were prepared using the image processing software, Corel Photo-Paint x7 ver. 17.1.0.572 ( © 2014 Corel Corporation) by arranging homologous pairs of chromosomes based on descending order of their sizes.

Statistics of data matrixes and phylogenetic trees
This study mainly focused on combining the molecular and cytological data of regionally restricted endemic and native Anacardiaceae species of Sri Lanka. Out of 15 endemics, nine species and one native taxon were included in this study. All these species were investigated for the fi rst time at molecular and cytological level. Extensive fi eld investigations were carried out throughout the country and most of the species were found at the Kanneliya Man and Biosphere Reserve (Ariyarathne et al., 2017). Some species were declared as Critically Endangered Possibly Extinct [CR(PE)] due to the inability to locate in the wild (MOE, 2012).
Length variations were observed among sequences in both ITS (rDNA) and matK datasets. Nuclear ITS dataset is composed of 33 ingroup taxa while that of matK contains 31 taxa, which includes nine endemic and a native taxon from Sri Lanka. Statistics and data characteristics from the maximum parsimony analysis of both gene regions and the combined datasets are given in Table 4.

Phylogenetic analysis of ITS, matK and combined data matrices
Maximum parsimony analysis of ITS data matrix revealed that the family Anacardiaceae is monophyletic. Within this super clade, a large clade is formed by species belong to the subfamily Anacardioideae whereas species of subfamily Spondioideae remain paraphyletic together with the genus Campnosperma. Two major clades of tribe Anacardieae and tribe Semecarpeae show weak bootstrap support. Most of the relationships between the species belonging to the tribe Rhoeae remain unresolved. The MP analysis of the matK matrix resulted in congruent topologies to that of the ITS. It also revealed the monophyly of the family Anacardiaceae. The subfamily Anacardioideae forms a monophyletic super clade and tribe Ancardieae forms a monophyletic subclade. Sri Lankan endemic Semecarpus species build a monophyletic group of tribe Semecarpeae while nonnative Semecarpus species are paraphyletic.

ITS
matK Combined

Phylogenetic relationships in Anacardiaceae
The family Anacardiaceae forms a large monophyletic group with strong support (1/100/100). One of the main aims of this investigation was to clarify the phylogenetic positions of Sri Lankan endemic species of Anacardiaceae. Within the subfamily Anacardioideae, all the endemic taxa including Mangifera zeylanica and Semecarpus species are well resolved in their subfamilial and tribal positions. The native species Nothopegia beddomei is placed in tribe Semecarpeae indicating its close relationship to genus Semecarpus. The non-native species of Trichoscypha acuminata, which was assigned to tribe Rhoeae based on morphological characters is found in close relation within the tribe Semecarpeae with weak support. Several researchers (Wannan & Quinn, 1991;Terraza, 1994;Takhtajan, 1997;Pell, 2004;Mitchell et al., 2006) have tried to solve the ambiguities in the classifi cation and phylogeny of the family Anacardiaceae. This attempt was made to understand the phylogenetic position of Sri Lankan endemic Anacardiaceae species and to look into chromosomal evolution of these endemics. All the Sri Lankan endemic species have been well placed in their corresponding taxonomic positions except for Campnosperma zeylanica. Phylogenies constructed for all three datasets (ITS, matK, and combined) support the placement of endemic species in the tribes Anacardieae and Semecarpeae, having C. zeylanica as a basal taxon.
As per the systematic history of the family Anacardiaceae, genus Campnosperma had been taxonomically problematic. Wannan and Quinn (1991) tried to treat genus Campnosperma taxonomically by assigning it in 'Group B' as per their classifi cation together with species of Spondiadeae and three other genera of tribe Anacardieae and Rhoeae.
Family Anacardiaceae was reported to be paraphyletic (Terrazas, 1994) with Burseraceae nested within the cashew family, sister to tribe Spondiadeae. However, the combined analysis of rbcL and morphological data suggested monophyly of family Anacardiaceae (Terrazas, 1994) with similar grouping as suggested by Bentham and Hooker (1862) and Wannan and Quinn (1991), having genus Campnosperma within subfamily Anacardioideae. Since then this genus has remained as a member of subfamily Anacardioideae.
Lepidote scales are very rare in the family Anacardiaceae but is characteristic of the genus Campnosperma. These scales are similar to that found in genus Tapirira (Tapirira lepidota Aguilar & Hammel), which is a member of subfamily Spondioideae (Hammel et al., 2014). In Anacardiaceae, stigmas are usually capitate and ovary 1-locular. However, Campnosperma contain bi-locular ovaries whereas almost all the species of subfamily Spondioideae are composed of more than one locular. Genus Campnosperma together with genus Pegia Coleb. contain discoid stigmas. Other than these, members of this genus share the tribal (Spondiadeae) characters of having spondias-type endocarp as categorised by Wannan and Quinn (1990), partially pachychalazal seeds, stilt roots and polygamodiocious plants. Therefore, the placement of this genus in subfamily Anacardioideae has been highly controversial; however, the present study corroborates with the placement of genus Campnosperma in the subfamily Spondioideae. This placement could be further supported by additional taxa from other regions. Pell (2004), built a phylogeny of Anacardiaceae with matK plastid DNA sequences. In this investigation, 33 taxa belonging to the fi ve tribes of the family Anacardiaceae and fi ve species of the outgroup family Burseraceae were used. The tree topologies of the present study agree with the phylogenetic tree constructed by Pell (2004), which shows a close relationship between the tribes Semecarpeae and Anacardieae that form a clade, as well as paraphyly of the subfamily Spondioideae.
The position of the displaced non-native species of T. acuminata also remains questionable, but since the sequences were obtained from GenBank, errors in plant identifi cation and processing cannot be excluded. Therefore, further investigations have to be carried out to fi nd the exact place of this species in the phylogenetic Journal of the National Science Foundation of Sri Lanka 48 (3) September 2020 tree by including properly identifi ed and sequenced individuals in the analyses.

Chromosome numbers and karyotypes in Anacardiaceae
Chromosomes numbers of fi ve endemic Anacardiaceae species are reported for the fi rst time given in the Table 5 and Figure 3. Chromosome numbers obtained during this study together with the previously published reports are given in the Table 6.

Phylogeny and chromosomal evolution
Recent advances in molecular biology has allowed more precise evaluation of the importance of polyploidy in fl owering plant evolution with most, if not all, plants being of ancient paleopolyploid origin (Soltis et al., 2009;Du et al., 2012;Li et al., 2015). With the recent developments in the fi eld, the most appropriate way of estimating chromosome numbers in phylogenetic trees is by using ancestral character reconstruction software like ChromEvol and Chromploid. However, the applicability of these software for the present dataset was not possible due to the lack of cytological data of foreign taxa. Thus, the chromosomal numbers obtained in this study and counts from the literature survey (Table 6) were manually mapped to the phylogeny obtained from the combined matrix to elucidate the chromosomal evolution of this family (Figure 4).
The two out group taxa of Burseraceae show chromosome numbers of 2n = 24 and 2n = 48, which supports the polyploidy within the family. Chromosome numbers of the family Anacardiaceae ranges from 2n = 24 to 2n = 58. Among these, most of the species belong to the tribe Rhoeae recorded 2n = 30 with the exception of S. molle and P. chinensis with chromosome numbers 2n = 24 and 28, respectively.     Table 6: Accession numbers for the species downloaded from GenBank together with reported chromosome numbers with references and the putative ploidy levels are given. Species that were studied for the fi rst time indicated in bold letters and the '*' indicates the endemic species of Sri Lanka. Accession numbers marked with 'NA' were not included in the corresponding matrixes.
Journal of the National Science Foundation of Sri Lanka 48(3) September 2020 Chromosome numbers of species belonging to the tribe Anacardieae vary between 2n = 40 and 2n = 42. Previous investigations have concluded the diploid number of M. indica as 2n = 40 (Mukherjee & Ammal, 1955;Darlington, 1950;1957;Pierozzi & Rossetto, 2006). In the present investigation, chromosome count for the endemic species M. zeylanica is 2n = 42 (Figure 3). This could be due to dysploidy when compared with M. indica. Chromosome counts of endemic Semecarpus species in tribe Semecarpeae increase from 2n = 50 to 58 through the subclade suggesting speciation through dysploidy events. The range of the chromosome numbers obtained in this study agree with available counts in the literature for genus Semecarpus as n = 29, 30 (Mehra, 1976;Gill et al., 1981;Bir et al., 1982;Pell et al., 2010) and 2n = 60 (Hou, 1978).
According to Raven (1975) and Lewis (2012) the common basic numbers of the family Anacardiaceae are x = 14, 15, and 16 with a few exceptions; genera, Mangifera (x = 20), Ancardium (x = 21) and several genera with x = 12. They justify the hypothetical basic number of the family Anacardiaceae as x = 7, which gave rise to tetraploidy in most of the taxa. The two

September 2020
Journal of the National Science Foundation of Sri Lanka 48 (3) main mechanisms of chromosomal evolution, dysploidy (increasing or decreasing) and polyploidy have been suggested by previous investigators (Escudero et al., 2014;Mota, 2014). The present study emphasises polyploidy in the evolution of Anacardiaceae. However, the pattern of chromosome number change along the phylogenetic tree indicates increasing dysploidy. According to Wendel (2000), 70 % of angiosperms are considered to be polyploids including most of Anacardiaceae species.
Polyploidy has been a key mechanism driving the evolution of angiosperms with great rarity of reduction of polyploidy levels back to diploids. Raven (1975) suggested that diploid level has been the main pathway of vascular plant evolution and the diploids have given rise to polyploids under number of circumstances. Polyploidy also has an eff ect on decreased diversifi cation rate (Escudero et al., 2014). According to Osborn et al. (2003), species that have undergone either polyploidy or dysploidy often tend to demonstrate phenotypic deviation from their diploid ancestors. These new traits might play a role in natural selection process. Some of these traits such as increased apomixis, pest resistance, drought tolerance, fl owering periods, and fruit and leaf size could off er higher chances in survival, and thus being selected as economic crops (Osborn et al., 2003).
The mechanisms that drive polyploidy is not yet fully understood but reasonably assumed that the adaptive neofunctionalization process in eff ect with mutating/ duplicating genes have relaxed constraints on their function, thus diverging to new phenotypes (Osborn et al., 2003).
Future comprehensive and collaborative molecular and cytological studies would reveal the driving force of polyploidy through the evolutionary pathway as this study explored the phylogenetic positions and chromosomal evolution of Sri Lankan endemic Anacardiaceae species.

CONCLUSION
Several taxonomic treatments based on anatomical, morphological, phytochemical and molecular data have been carried out in the past focusing on the phylogeny of the family Anacardiaceae. However, native and endemic Anacardiaceae fl ora of Sri Lanka yet remained understudied. This is the fi rst molecular phylogenetic study, including representatives from all genera having endemic species of the family Anacardiaceae from Sri Lanka. This study resolves the monophyly of the family Anacardiaceae, while recovering the paraphyly of the two subfamilies Anacardioideae and Spondioideae with the questionable placement of taxa belonging to the subfamily Anacardioideae. The present study also questions the phylogenetic placement of genus Campnosperma. Cytological evolution of the family shows the major dysploidy events. Certain genera, such as Campnosperma and Nothopegia are needed to be further investigated with increased sampling including pantropical species and utilising more molecular markers. Chromosome evolutions of diff erent families need to be investigated to reveal the evolutionary scenarios.