info@biomedres.us   +1 (720) 414-3554
  One Westbrook Corporate Center, Suite 300, Westchester, IL 60154, USA

Biomedical Journal of Scientific & Technical Research

April, 2021, Volume 35, 2, pp 27419-27428

Review Article

Review Article

Genetic Annotation of BZIP Transcription Factor Family Genes in Soybean by Using Different Soft Wares

Tahira1, Nazakat Nawaz1, Aqsa Saeed2, Waqas Manzoor Bhutta2* and Ahsan Muhayuudine3

Author Affiliations

1Oilseeds Research Program, CSI, NARC, Pakistan

2University of Agriculture, Pakistan

3Oilseeds Research Institute, AARI, Pakistan

Received: April 08, 2021 | Published: April 20, 2021

Corresponding author: Waqas Manzoor Bhutta, University of Agriculture, Faisalabad, Pakistan

DOI: 10.26717/BJSTR.2021.35.005663

Abstract

Transcription Factors play a vital role in almost all the biological processes. After the widespread sequencing of genomic and cDNA, it was brought under notice that the legumes encode more than 2,000 transcription factors per genome. Despite of the fact that these transcription factors play pivotal roles in legume revolution, plant development and differentiation, still less than 1% of these factors have been completely characterized genetically. In plants, the transcription factor family containing a Basic Leucine Zipper domain (BZIP domain) is among the chief family of the transcription factors although, these factors are found in all other eukaryotes as well. Here the Genetic Characterization of BZIP transcription factors family will be performed which will ultimately pave a way for the identification and study of the transcription factors of the most complex crop which has been sequenced so far.

Keywords: Soybean; BZIP; Transcription Factors Family; Genetic Characterization

Introduction

Transcription factors are the proteins that bind with DNA and then interact with the other transcriptional regulators including chromatin remodeling or modifying proteins, for blocking or recruiting the access of RNA polymerase to the DNA template. They, play a vital role in almost all the biological processes. The genomes of the plants dedicate approximately 7% of their coding sequences to these vital transcription factors. This justifies the complexity and importance of the transcriptional regulation into the plant genomes. The development and differentiation of plants is chiefly programmed at gene transcription level, which is regulated by transcription factors along with the other proteins which may block or recruit the RNA polymerases access to DNA template [1].

Transcription factors are commonly defined as the sequencespecific DNA-binding proteins which can either repress or activate the transcription process. Now, it has been investigated that the genome of plants encodes more transcription factors than the genome of animals, which is a clear indicator of the fact that the complex nature of transcriptional regulation in plants is at par with that of the animals [2] as nearly every biological process is directly influenced or regulated by these transcription factors [3]. This fact is evident through example that there is a specific class of transcription factors called “general transcription factors” and without these factors, the process of transcription will not occur in eukaryotes [4]. Studies revealed that the transcription factors have been precisely involved in the process of cell development, division, differentiation and migration. The transcription factors of Arabidopsis thaliana have been deeply studied since its genome is being sequenced as a model specie [3,5,6]. After the widespread sequencing of genomic and cDNA, it was brought under notice that the legumes encode more than 2,000 transcription factors per genome. Despite of the fact that these transcription factors play pivotal roles in legume revolution, plant development and differentiation, still less than 1% of these factors have been completely characterized genetically. This pavesa way for the identification and study of the transcription factors of the other newly sequenced crops, like soybean, through comparative analysis and homology searching.

Soybean is known to be a great source of proteins containing the substantial amounts of all the essential amino acids in it. It also contains some of the amino acids that are not even synthesized by the human body (Henkel, 2000). Soybean has now become an important crop in numerous countries around the globe from the past 60 years [7]. Furthermore, Soybean is also known to be the most complex crop which has been sequenced so far. Nowadays, the annotation and identification of Soybean transcription factors has become more convenient after the annotation and prediction of its homology-based genes, which produce the putative protein sequences [8].

In plants, the transcription factor family containing a Basic Leucine Zipper domain (BZIP domain) is among the chief family of the transcription factors although, these factors are found in all other eukaryotes as well. In plants, these transcription factors regulate genes in response to the seed maturation, abiotic stresses, pathogen defense and flower development [3]. This family is present in numerous DNA binding proteins of the eukaryotes. It’s one portion has a region which facilitates the sequence specific binding of DNA and the other portion, called the leucine zipper, which holds the two DNA binding regions together. This DNA binding region consists of numerous basic amino acids like lysine and arginine. Proteins comprising this domain are the transcription factors [9,10]. BZIP transcription factors are found nearly in every eukaryote and BZIP is one of the chief families of the dimerizing transcription factors [11,12]. This paper is concerned with the Genetic Characterization of BZIP transcription factors, which will ultimately pave way for the identification and study of the transcription factors of the most complex crop which has been sequenced so far.

Review of Literature

The use of Bioinformatics approaches has been contributory in identification of the putative Transcription factors in the plants. Transcription factors are commonly defined by using the DNAbinding domain types which are held by the proteins in a family while the putative transcription factor genes are identified mainly due to DNA sequences present inside of the genes which encode the known DNA binding [2,13,14]. Wolfong et al. in 1997 were successful in isolating a cDNA which encodes a novel BZIP protein called G/HBF-1, which is responsible for the binding of the H-box and the adjacent G-box in the proximal section of the chalcone synthase promoter in Soybean.

Haiyang et al. in 2014 proposed through research that the two transcription factors, GmFT2a and GmFT5a, present in soybean, redundantly and differentially control the photoperiod-regulated flowering via the transcriptional up regulation and physical interaction of BZIP’s transcription factor i.e., GmFDl9, which then ultimately catalysis the floral identity genes expression in soybean. Murilo et al. 2013 reviewed that the BZIP transcription factors present in plants are responsive to pathogens also. While Jackoby et al. [2], positioned the BZIP proteins in Arabidopsis into the ten major groups i.e. A, B, C, D, E, F, G, H, I and S. Here, every group of BZIP proteins has a specific sequence which is like the basic region and has common features also, such as the position of the leucine zipper in the protein sequence and the size of the leucine zipper domain.

The classification of BZIP transcription factors (GmbZIPs) in soybean was completed through examination of 47 BZIP sequences along with 75 AtbZIP proteins. Actual outcome was the classification of BZIP transcription factors into10 groups, like was in Arabidopsis. As this classification was done on the basis of the conserved domains. So, it is suitable in plants to for the general classification of BZIP proteins [15]. Heinekamp et al. [16], in 2002 analyzed the tobacco BZIP proteins and found that the BZIP protein BZI-1 exhibits all the distinctive features of a transcription factor. Its function is to bind DNA, particularly the ACGT containing ciselements, and it is localized inside the nucleus of the cell, while its N-terminal domain plays important role as a trans-activation domain in the plant cells.

Liao et al. [15], in 2008 studied 131 BZIP type of transcription factor genes in soybean. They performed the expression analysis of these genes against different stresses, and then from all of the genes, three (03) genes were additionally investigated in relation to stress tolerance, transcriptional activation and DNA-binding specificity. Their results indicated that those transgenic Arabidopsis plants which over express the three additionally investigated genes were less sensitive towards ABA but were more tolerant to the freezing and salt stresses. Soybean uni genes were analyzed by Tian et al. [17] in 2004through EST assembly and as result more than 1,000 transcription factor genes were identified. Arabidopsis possess at least four times as many BZIP genes as worm, yeast and humans [2]. Through molecular and genetic studies of the few of these Arabidopsis thaliana BZIP genes, it is evident that they regulate varied biological processes, for example, stress and light signaling, pathogen defense, flower development and seed maturation. As, BZIP TFs which are contrasting with the functions of WRKY TFs and R2R3-MYB, may contribute to the more diversity being plant-specific if they are early recruited into the plant evolution [18,19]. Chuang et al. [20], in 1999studied the Perianthia genes which are involved in determination of the floral organ number in Arabidopsis thaliana. There, Perianthia genes have homology with the BZIP transcription factors. Moreover, numerous BZIP proteins in plants are known to bind the ACGT cis-acting element, which was identified as the promoters of the plant, bacterial and viral genes [21-26].

Methodology

BZIP transcription factor of soybean was used for the different analysis. MEGA-X, Soybean Knowledge Base databases were used for the performance of different analysis. The Molecular Evolutionary Genetics Analysis or MEGA software is a desktop application which was designed for comparative analysis of homologous gene sequences either from different species or from multi gene families with a superior importance on deducing evolutionary relationships and patterns of protein and DNA evolution. Along with the statistical tools for data analysis, MEGA also contains numerous facilities for convergence of sequenced datasets from web-based repositories or files; moreover, it comprises tools for visual demonstration of the results which are obtained in the interactive form of evolutionary distance matrices and the phylogenetic trees [27]. Soybean Knowledge Base or Soy KB is a comprehensive all-embracing resource of web for translational genomics of Soybean. It was designed for the better integration and management of genomics of soybean, proteomics, metabolomics and transcriptomics data along the annotation of biological pathway and gene function. Moreover, it possesses vital tools like gene family search, Affymetrix probe ID search, protein 3D structure viewer and metabolite search, also the user can upload as well as download annotations and experimental data [28]. The genes of BZIP family downloaded from Soy KB, used for the analysis are shown in Table 1.

Table 1: Genes of BZIP Family.

Result

Evolutionary Relationships of Taxa

Figure 1 According to Rzhetsky and Nei [29] the history of evolution was derived by using the Minimum Evolution method. Here the sum of branch length of an optimal tree is demonstrated as 175.29000652. Now, like those of the evolutionary distances which are used to infer the phylogenetic tree, similarly a tree is sketched to a scale having branches of the same length units. Moreover, evolutionary distances are computed by using maximum Composite Likelihood method [30] and its units are number of base substitutions per site. The Minimum Evolution tree was examined by utilizing the Close-Neighbor-Interchange (CNI) [31] at a search level of 1. The Neighbor-joining algorithm [32] was utilized for the generation of the initial tree while 119 nucleotide sequences were involved in this analysis while,1st+2nd+3rd+Noncoding were the Codon positions. By using pairwise deletion option all the abstruse positions were discarded for each of the sequence pair. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 1: Evolutionary relationships of taxa.

Use of Maximum Likelihood Method for Evolutionary Analysis

Figure 2 Here, the Maximum Likelihood method and the Tamura-Nei model were used to derive the evolutionary history [34]. The tree carrying the highest log likelihood (-219250.10) is depicted above. Neighbor-Join and Bio NJ algorithms were applied to a matrix of pair wise distances which were estimated by using the Maximum Composite Likelihood (MCL) method to automatically calculate initial tree(s) for heuristic search, and then the selection of topology having superior log likelihood value will be done. The tree is sketched to such scale, in which branch lengths are measured in the number of substitutions per site. Meanwhile, 119 nucleotide sequences were involved in this analysis while1st+2nd+3rd+Noncoding were the Codon positions being included in this analysis. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 2: Use of Maximum Likelihood method for evolutionary analysis.

Evolutionary Relationships in Taxa

Figure 3 According to Saitou and Nei [32] the history of evolution was derived by using Neighbor-Joining method. Here, the sum of branch length of an optimal tree is demonstrated as 200.73212680. Now, like those of the evolutionary distances which are used to infer the phylogenetic tree, similarly a tree is sketched to a scale having branches of the same length units. Moreover, evolutionary distances are computed by using maximum Composite Likelihood method [30] and its units are number of base substitutions per site. So, 119 nucleotide sequences were involved in this analysis while, 1st+2nd+3rd+Noncoding were the Codon positions being included in this analysis. By using pair wise deletion option all the abstruse positions were discarded for each of the sequence pair. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 3: Evolutionary relationships in taxa.

Maximum Parsimony Analysis of Taxa

Figure 4 Here, the Maximum Parsimony method was used to derive the evolutionary history. The evolutionary history was derived by utilizing the Maximum Parsimony method. The most parsimonious tree having length of 80376 is depicted here. There tension index is 0.310626 (0.310626), the consistency index is 0.060193 (0.060193), and the composite index is 0.018697 (0.018697) for all sites and parsimony-informative sites (in parentheses). The MP tree was obtained using the Subtree-Pruning- Regrafting (SPR) algorithm [31] with search level of 0 in which the initial trees were obtained by the random addition of sequences (10 replicates). So, 119 nucleotide sequences were involved in this analysis while, 1st+2nd+3rd+Noncoding were the Codon positions being included in this analysis. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 4: Maximum Parsimony analysis of taxa.

Evolutionary Relationships of Taxa

Figure 5 Here, the UPGMA method was used to derive the evolutionary history [35]. Here, the sum of branch length of an optimal tree is demonstrated as 194. 89363191.Now, like those of the evolutionary distances which are used to infer the phylogenetic tree, similarly a tree is sketched to a scale having branches of the same length units. Moreover, evolutionary distances are computed by using maximum Composite Likelihood method [30] and its units are number of base substitutions per site. So, 119 nucleotide sequences were involved in this analysis while, 1st+2nd+3rd+Noncoding were the Codon positions being included in this analysis. By using pair wise deletion option all the abstruse positions were discarded for each of the sequence pair. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 5: Evolutionary relationships of taxa.

Estimation of Average Evolutionary Divergence over All of the Sequence Pairs

Figure 6 Analysis was conducted by using maximum Composite Likelihood model [30] and the number of base substitutions per site are shown which the average of the overall sequence pairs is. So, 119 nucleotide sequences were involved in this analysis while, 1st+2nd+3rd+Noncoding were the Codon positions being included in this analysis. By using pair wise deletion option all the abstruse positions were discarded for each of the sequence pair. Hence, 1610 positions were in total present in the final dataset while MEGA-X was used for computing the Evolutionary analyses [33].

Figure 6: Estimation of Average Evolutionary Divergence over all of the Sequence Pairs.

Heat Map Analysis

Figure 7A heat map of the 119 BZIP genes is shown here. Individual values contained in a matrix are represented as light and dark colors. Here, red cells denote small values, and red small ones.

Figure 7: Heat Map Analysis.

Conclusion

Transcription factors along with the addition of RNA polymerase complex physically interact along with other proteins to cause variations in the gene transcription [36]. Now, the exact structures of these complexes are still unidentified for most of the plant genes, even though this knowledge is like a precondition to understand the combining control of transcription [37]. Lastly, the genes network being regulated by a “single” transcription factor and its allies are the small part of a greater genetic regulatory network which ensures coordinated expression of genes which are involved in diverse cell related processes throughout from plant differentiation to plant development. So, there is a requirement of incorporation of many of the genomic approaches like functional genomics and bioinformatics to interpret these global gene networks. Hence, the interpretation of these global gene networks would therefore be requiring a combination of the numerous bioinformatic, genomic and functional genomic approaches. A considerable progress in the research of BZIP transcription factor is done over the last 20 years by the application of varied approaches and innovative technologies. Additionally, if BZIPs are considered as the candidate genes in our breeding projects and various other crop improvement programs, it would provide us a vibrant understanding of the several biotic stress-related “signal transduction” events. Hence, it will therefore hint us towards the development of various genetically modified and manipulated crop varieties having the upgraded level of the stress tolerance [38-41].

Conflict of Interests

None.

References

Review Article

Genetic Annotation of BZIP Transcription Factor Family Genes in Soybean by Using Different Soft Wares

Tahira1, Nazakat Nawaz1, Aqsa Saeed2, Waqas Manzoor Bhutta2* and Ahsan Muhayuudine3

Author Affiliations

1Oilseeds Research Program, CSI, NARC, Pakistan

2University of Agriculture, Pakistan

3Oilseeds Research Institute, AARI, Pakistan

Received: April 08, 2021 | Published: April 20, 2021

Corresponding author: Waqas Manzoor Bhutta, University of Agriculture, Faisalabad, Pakistan

DOI: 10.26717/BJSTR.2021.35.005663

Abstract

Transcription Factors play a vital role in almost all the biological processes. After the widespread sequencing of genomic and cDNA, it was brought under notice that the legumes encode more than 2,000 transcription factors per genome. Despite of the fact that these transcription factors play pivotal roles in legume revolution, plant development and differentiation, still less than 1% of these factors have been completely characterized genetically. In plants, the transcription factor family containing a Basic Leucine Zipper domain (BZIP domain) is among the chief family of the transcription factors although, these factors are found in all other eukaryotes as well. Here the Genetic Characterization of BZIP transcription factors family will be performed which will ultimately pave a way for the identification and study of the transcription factors of the most complex crop which has been sequenced so far.

Keywords: Soybean; BZIP; Transcription Factors Family; Genetic Characterization