The Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Mulberry ( Morus alba Lin.)

one of with a with high development prospects and this The of two assembled on the reads Pacbio and Illumina sequencing plat forms. full length of multicaulis a large single-copy (LSC) region of 87,880 bp and a small single-copy (SSC) region of 19,835bp long. M. atropurpurea consists of an LSC region (87,670), an SSC region (19,750 bp) and (IRa and IRb) are both 25,667 bp which is shorter than M. multicaulis (158,776bp). Each cpDNA contains 113 functional genes: including 79 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The CG content of M. multicaulis and M. atropurpurea were 36.24% and 36.26% respectively. The MEGA-X was used to construct a phylogenetic tree 26 species. The result shows that more related to their congeners than to others. This study provides more detailed information on cpDNA evolution and its structural analysis are important for the chloroplast genome project, development of molecular markers for Morus species.


Introduction
The chloroplast (cp) is the photosynthetic organelle representing one of the most important organelles in green plants and algae [1]. The origin of chloroplasts can be dated back to about 390,000 years ago [2]. The mainstream view is the theory of endosymbiosis, which believes that plant chloroplasts originated from ancient cyanobacteria that were symbiotic in primitive eukaryotic cells [3]. In angiosperms, the chloroplast genome (cpDNA) is typically composed of a pair of inverted repeat regions (IRa and IRb), which are separated by a small single-copy (SSC) region and a large singlecopy (LSC) region [4]. The contraction and expansion of IR region [5,6] determine the length of the chloroplast genome.
The GC content in the chloroplast genome is generally low, about 37% on average, with AT tendency [8]. The chloroplast sequence is wildly used to phylogenetics, species identification, population genetics, and genetic engineering [9][10][11]. Recently, reports on phylogenetic analysis in rice [12], analysis clarify the taxonomic status of Capsicum L (Elmosallamy et al. 2019) and Korean ginseng [13]. The cpDNA sequence of Moraceae is incomplete and limited.
The results of phylogenetic study on mulberry based on nuclear genome redefined the mulberry as eight species [14]. But the conclusions are not enough to reveal the complex systemic origin, evolutionary relationship, and phylogenetic studies of mulberry.
DNA molecular markers like indels, SSRs and small inversions [15], ITS regions [16,17] have been used to study genetic and genome diversity and phylogenetic analysis. In this study, In this study, the cpDNA sequences of M. atropurpurea and M. multicaulis were investigated, and a comparative analysis was performed between cultivated Morus and M.mongolica. The genome structure, gene order, repeat sequences, and phylogenetics were analyzed.

Quality Control of Sequencing Data
To improve the accuracy of the analysis, the Raw Reads were filtered again according to the following criteria: 1. Removal of the sequenced connectors and primer sequences in reads,

2.
Reads with an average mass value less than Q5 were filtered out,

3.
Reads with N number greater than 5 were removed. The quality reads after the above checks, called clean reads were subjected to subsequent analysis.

Assembly and Annotation
The SPAdes (3.13.0) software was used for the genome splicing [18], The candidate sequence assembly was determined, annotate, and then GeSeq was used to draw the circular gene map.

Analysis of Repeated Sequences
Long repeats include three types: forward (P), palindrome (P) and tandem (T) repeats, which may promote chloroplast genome rearrangement and increase population genetic diversity. We use vmatch (http://www.vmatch.de/) The software (parameter: minimal repeat size 30bp) finds the scattered long repeat fragments in the chloroplast genome.

Comparative Analysis of Chloroplast Genomes of Morus Species with Other Species
The mVISTA online software in shuffle-LAGAN mode was

Phylogenetic Analysis
The MEGA X software was used to determine the phylogenetic    (Table 1). M. multicaulis with, a circular double-stranded DNA composed of two identical IR regions (25,551). bp), an LSC region (87,880 bp) and an SSC region (19,835 bp). M. atropurpurea (Lunjiao40) also has a typical quadripartite structure with 158,776 bp long, IRa and IRb with 25,667 bp in length separated by LSC region (87,670 bp) and SSC region (19,750 bp). The GC content of M. multicaulis chloroplast genome is 36.24%., LSC (33.90%) and SSC (29.24%) regions which are lower relatively lower, compared to the IR region (42.99%) ( Table 2).

Analysis of Sequencing Data and Quality Control
The GC content of the chloroplast genome is very close, and no changes were found to occur in the IR region of the seven mulberry species selected. The cpDNA contains 113 functional genes, including 79 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Pseudogenes and ORFs not included ( Table 3). The cpDNA can be divided into three categories depends on the function. The first is related to self-replication containing 70 genes, the second is related to Photosynthesis containing 38 genes, and the third is related to fatty acids and amino acids biosynthesis. There was a total of 5 genes with unknown functions. The results is similar to that found in green plants [6].

Codon Usage
All 79 protein-coding genes in the cpDNA of M. multicaulis and M. atropurpurea were encoded by 60,765 codons (Table 4)

Repeat Sequences Analysis
Long repeats include three types: forward (P), palindromic (P) and tandem (T) repeats, which may promote chloroplast genome rearrangement and increase population genetic diversity. The M.
atropurpurea with the most extended repetitive sequence located in 133,098 with 25,678 bp, which is 127 bp longer than that of M.
multicaulis and the shortest is 31 bp, which is 8 bp shorter than that of M. multicaulis (Tables 5 & 6). This is a phenomenon of genome rearrangement and may be related to genetic diversity.

Comparison with Other Species Chloroplast Genomes
We compared with others 6 species and two exogenous

Discussion
In this study, we collected two cultivated species of Morus L. with two species different accession also have slight differences.
"Husang32" and "Ribentiancheng" of M. multicaulis, with "Husang32" 377bp being shorter than the "Ribentiancheng" ( Table   1). This result agrees with the gene sequence used by Li Qiaoli [20] on M. multicaulis. Also, M. atropurpurea "Lunjiao40" is 337bp shorter than "Yichuanhong" ( Table 2). There were few differences in the length of the IR and SSC regions of the cpDNA from seven species, but most of the differences mainly focus on the LSC region.
The results indicated that these species are closely related and confirmed by the phylogenetic tree analysis.
IR expansion/contraction studies reveal the considerable difference even in the same family [13]. It is believed that the borders of the IR region (IRa and IRb) with the LSC and SSC regions, some expansion and contraction may play an important role in evolution [21]. Therefore, it is of great interest to compare IR/SC junctions in Usually, cpDNA also tends to AT [7,22], The AT content of the four regions of the seven species (

Data Availability Statement
The whole chloroplast genome data have been deposited at NCBI with the following accession numbers: MW548981 and