Jing-qi Yang, Ming Yang* and Qing Huang*
Received: January 13, 2023; Published: January 26, 2023
*Corresponding author: Ming Yang and Qing Huang, Department of Cardiovascular Medicine, Jiangxi Provincial People’s Hospital, Nanchang, China
DOI: 10.26717/BJSTR.2023.48.007632
Background: Early coronary atherosclerosis (CAS) is a common disease and easily develops into coronary heart disease or even myocardial infarction over time. However, the underlying disease mechanism is not yet completely clear.
Methods: The gene expression profile of GSE132651 were used to performed weighted gene co‑expression network analysis (WGCNA) and differentially expressed genes analysis, to identify the key modules and differentially expressed genes (DEGs) associated with early CAS, respectively. Then, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were applied. After overlapped the DEGs and the key module genes, we constructed a proteinprotein interaction network and identified the hub genes. The diagnostic capacity (ROC curve) and the comparative toxicogenomics database (CTD) were used not to validate the hub genes. Moreover, we also analyzed statins’ therapeutic targets.
Result: A total of 161 differentially expressed genes were filtered. WGCNA was constructed and genes were classified into 17 modules. Among them, the cyan module, which contained 99 genes, was most closely associated with early CAS. The GO analysis demonstrated that the cyan module was mainly enriched in platelet activation, coagulation, hemostasis, and blood coagulation. The KEGG terms were associated with chromosome segregation, transcription regulation, DNA replication, mitotic sister chromatid segregation, and G1/S transition of the mitotic cell cycle. Finally, ALCAM, BDNF, CRYL1, DYSF, FGF2, FST, LYN, SH3BP5, TGFB2, and TGFBR2 were identified as key genes critical for early CAS pathogenesis. Among them, BDNF, FST, TGFB2, LYN, and SH3BP5 were the targets of statin therapy.
Conclusion: Our study identified several key genes and targets of statin therapy that acted as novel candidates in the etiology of early CAS, which may contribute to the diagnosis and treatment of this disease.
Keywords: Early Coronary Atherosclerosis; Weighted Gene Co-Expression Network Analysis; Key Genes; Statins; Therapeutic Targets
Abbreviations: WGCNA: Weighted Gene Co‑Expression Network Analysis; DEGs: Differentially Expressed Genes; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; CTD: Comparative Toxicogenomics Database; CAS: Coronary Atherosclerosis; CHD: Coronary Heart Disease; CAD: Coronary Artery Disease
Coronary atherosclerosis (CAS) is a complicated metabolic disease and is often a significant part of the pathogenesis of coronary heart disease (CHD). It is characterized by endothelial dysfunction and chronic inflammation that interact with metabolic changes to trigger, propagate, and activate lesions in vessel walls [1,2]. Accumulating studies [3,4] have revealed that although endothelium dysfunction is the initial factor of atherosclerosis and coronary artery disease (CAD), the fundamental mechanisms of early CAS also need a better understanding and explore the molecular pathways driving development of early CAS. CAS tends to develop into acute coronary syndrome [5]. Although several therapies have been used to treat CAS to prevent further stenosis of coronary arteries, limited efficacy and the adverse side effects of drugs cause clinicians to face challenges in managing patients with CAD. Traditional drug therapy is still one of the treatment methods used for early CAS. Statins, 3-hydroxy-3- methylglutaryl coenzyme-A (HMG-CoA) reductase inhibitors, are a class of lipid-lowering compounds that have recognized effects in the treatment of hypercholesterolemia and the prevention of cardiovascular disease [6]. Increasing evidence suggests that they can improve endothelium-mediated vasodilation, reduce oxidative stress and inflammation, and down-regulate the angiotensin II type I receptor [7]. Whereas, the underlying mechanism of the development or progression of early CAS remains unclear, and a lack of statins treatment for CAS remains. Therefore, it is clinically significant to explore biomarkers or molecular therapeutic targets that may affect early arteriosclerosis.
Weighted Gene Co-Expression Network Analysis (WGCNA) is a method used to explore the correlation between genes and a given feature by performing weighted correlation network analysis [8,9]. The unique advantage of WGCNA is its ability to convert gene expression data into co-expression modules that reveal gene networks and signaling pathways [10,11]. As far as we know, the application of WGCNA in the identification of regulatory networks of early CAS has not been reported so far. In our study, we derived microarray data of gene expression profiles of human Blood Outgrowth Endothelial Cells from the GEO database and integrated expression profiling datasets to construct an expression network for WGCNA to explore the dynamic process of early CAS pathogenesis and development. The patterns of expression profiles in the Normal and Abnormal groups and the differences between these profiles were comprehensively analyzed using WGCNA and other specialized bioinformatics analysis tools. The results of the present study could be conducive to comprehensively understanding the pathogenesis of early CAS and pinpointing the molecular mechanism involved in the pathological process and providing insights into novel treatment and therapeutic targets for drug development.
Data Processing
The data sets of GSE132651 and GSE32547 used in the present study were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) from the NCBI. All gene expression profiles were acquired from human samples. The GSE132651, based on the GPL96 platform, was the dataset from Hebbel, et al. [12] in 2020, including 6 endothelial cells with normal coronary endothelial function (the “Normal group”), and 13 endothelial cells with abnormal coronary endothelial function (“Abnormal group”); of note, all samples were from adults < 45 years old. GSE32547, based on GPL570, was a validation set to explore molecular therapeutic targets that may affect early arteriosclerosis and contained endothelial cells’ gene expression profiles from 9 endothelial cell control samples and 9 endothelial cell statin-treated samples. The normalize Between Arrays function in the limma package [13] was used to normalize the gene expression profiles. Probes corresponding to multiple genes were removed. If multiple probes were used to detect a gene, the expression level of the gene was calculated by the average expression of all probes.
WGCNA and Co-expression Network Construction
The WGCNA package [8] for R (ver. 4.0.4) was used to construct a weighted gene co-expression network, identify the co-expressed gene modules, explore the correlation between the gene network and biological traits, and investigate hub genes in the network. At first, the genes were ranked according to the median absolute deviation from large to small, and the first 5,000 genes were selected for co-expression analysis; the correlation matrix was constructed subsequently. The Pearson correlation coefficient between genes was then calculated, and the filtering threshold was applied to determine whether the genes had similar expression profiles. Then, we convert the weighted adjacency matrix into a topological overlap matrix (TOM) to evaluate network connectivity, and apply the hierarchical clustering method to construct clustering dendrograms. Concerning the dendrogram, different nodes and colors stand for corresponding gene modules. According to different expression patterns, genes were clustered into multiple modules based on the weighted correlation coefficient. Genes that have similar expression patterns were grouped into the same module. After clustering, a heatmap was plotted to visualize and calculate the inter-module correlation. The relationship between modules and clinical traits was further evaluated to determine the modules associated with early CAS for analysis.
Functional Annotation for Modules of Interest
The R package ‘clusterProfiler’ [14] was used to perform Gene Ontology (GO) [15,16] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [17,18] pathway enrichment analyses. The KEGG pathways and GO terms were exhibited according to the “GOplot” package [15], and the significance threshold was set to P <0.05.
Differentially Expressed Genes (DEGs) Screening and Interactions with the Modules of Interest
We performed base 2 logarithm conversions, background correction, and quantile normalization on the expression profiles of GSE132651 and GSE32547 using the ‘limma’ package [13] in R 4.0.4 software. The probe IDs were matched to the corresponding gene symbols based on the annotation file. After pre-processing, genes with adjusted P <0.05 and at least a 0.5-fold change were considered as DEGs between normal and abnormal coronary endothelial function samples by the ‘limma’ package. In addition, the DEGs in the two groups overlapped with the modules of interest and were presented through a Venn diagram.
Protein-Protein Interaction (PPI) Network Construction and Hub Gene Identification
After overlapping the DEGs and genes in the modules of interest, we entered these genes into the STRING database (http://string-db. org), collected the information on the interaction of target proteins with a moderate confidence of > 0.4 [19] and constructed a PPI network ueing Cytoscape software (v3.8.2) [20]. Subsequently, we identified the top 10 hub genes using ‘cytoHubba’, a plug-in from Cytoscape.
Validation of Hub Genes
The overlapped genes were calculated through the receiver operator curve (ROC) analysis and evaluated the diagnostic accuracy of the hub genes according to calculate the area under the curve (AUC). The AUC is the value of the Wilcoxon–Mann–Whitney statistic [21]. The ‘caTools’ package in R was employed for ROC analysis and computed AUC with 95% confidence interval (CI). We applied the comparative toxicogenomics database (CTD) integrated information including chemical-gene/protein interactions, chemical-disease, and gene-disease relationships to develop hypotheses related to the mechanisms of CAS [22]. Then, we analyzed the inference score between hub genes and the risk of atherosclerosis, coronary heart disease, cardiovascular disease using the data in CTD.
Weighted Gene Co-expression Network Construction
The gene expression matrix of GSE132651 (12,547 genes) was obtained after data preprocessing. Subsequently, we selected 5000 genes with an absolute median difference and used Pearson’s correlation coefficient to cluster the samples in GSE132651. When the β value was at 5, the highest average connectivity in the network could be achieved with a fitting index = 0.89. Next, the adjacency matrix and the topological overlap matrix were constructed. The clustering method was used to cluster genes with high topological overlap to construct a clustering dendrogram, which was then pruned using the dynamic tree cut method to further classify the modules so that genes with high topological overlap were clustered into the same module. Finally, the similarity threshold was set to 0.25 to cluster similar modules into one comprehensive unit Based on average hierarchical clustering and dynamic tree clipping, the genes were divided into seventeen modules. The network heatmap of 400 randomly selected genes indicated the level of independence among these clusters were relatively high. The modules were all similarly sized, each branch had a clear outline, and modules with the same color showed excellent gene consistency, all suggesting that the construction of scale-free networks, construction of TOM, and module classification results were all reliable. Based on the constructed modules, the correlation between the module and phenotypic traits was analyzed and its significance was calculated. These results were used to plot the module-trait relationship heatmap. As indicated in the cyan modules exhibited significant association with early CAS traits, indicating their role in disease pathogenesis and progression. Additionally, It demonstrates that the absolute values of gene significance in the cyan modules were the highest among all modules, further confirming the significant correlation between the cyan group and early CAS. Thus, the cyan module was selected as a clinically important module and further analyzed. Finally, we plotted a scatter plot of the relationship between Gene Significance and Module Membership in the cyan module.
GO and Pathway Enrichment Analyses
To determine the biological functions and signaling pathways for the cyan modules, we conducted a GO analysis and KEGG analysis of these genes. The result indicated that the enriched biological processes were mainly involved in platelet activation, coagulation, hemostasis, and blood coagulation, whereas the enriched molecular functions were mainly involved in cadherin binding and SH3 domain binding. In, KEGG pathway analysis showed that the negative regulation of organelle organization and the response to drug and chromosome segregation were the most enriched pathways, followed by transcription regulation involved in DNA replication, mitotic sister chromatid segregation, and G1/S transition of the mitotic cell cycle.
DEGs between Normal and Abnormal Group
By applying the eBayes function in the “limma” package, differential expression analysis was performed based on the Bayes approach in R. The logFC was set at |log2FC| > 0.5, and the significance threshold was set at P <0.05. A total of 81 genes were up-regulated and 79 were down-regulated. The volcanic diagram for all genes and the expression heatmap of the DEGs.
Identification of Hub Genes and Overlapping he Key Module Genes with DEGs
By co-expression analysis, the cyan module (99 genes) was screened for genes with high module connectivity. We inputted these genes into the STRING database and constructed a PPI network via Cytoscape software. After overlapping the DEGs and cyan module genes, a total of 25 genes were screened out, and we identified the following genes as significant early CAS candidates by ‘cytoHubba’: ALCAM, BDNF, CRYL1, DYSF, FGF2, FST, LYN, SH3BP5, TGFB2, and TGFBR2.
Validation of the Hub Genes
To investigate the value of the candidate true hub genes we identified, CTD was employed to explore the inference score between crucial genes and CAS. Inference scores in CTD reflected the association between chemicals, diseases, and genes. The interaction results showed that ALCAM, BDNF, FGF2, FST, LYN, TGFB2, and TGFBR2 have a higher score with coronary artery disease, atherosclerosis, and cardiovascular diseases. In addition, the AUCs of the candidate true hub genes were calculated in GSE132651 date sets according to the ROC curves. As demonstrated in the AUC was calculated to distinguish the Abnormal group from the Normal group, and every AUC of the ten real hub genes was greater than 0.7. Hence, the seven genes were regarded as the true hub genes associated with early CAS.
The Therapeutic Targets for Statins
Previous studies have found that statins can improve early atherosclerosis. To confirm statin targets in treating early CAS, we further analyzed the dataset GSE32547. GSE32547 includes human Human Umbilical Vein Endothelial Cells (HUVECs) and HUVEC cell lines treated with statins. After normalizing the data, we found that after treatment with statins in HUEVCs, the expression of BDNF, FST, and TGFB2 was significantly down-regulated, while the expression of LYN and SH3BP5 was up-regulated, and the remaining five hub genes had no statistically significant difference in expression. However, the expression of these five significant genes in endothelial cells of patients with early atherosclerosis was the opposite in the GSE132651. These results indicated that statins can achieve therapeutic effect by regulating the expression of BDNF, FST, TGFB2, LYN, and SH3BP5.
Early CAS has a high risk of developing into coronary heart disease, and its worldwide incidence increasing yearly [23]. Although most patients use prescription medications or lifestyle changes to fight atherosclerosis, some patients eventually develop coronary heart disease or even acute myocardial infarction. Although various genes have been identified as being involved in atherosclerosis, the gene network associated with the development of early CAS has not been elucidated. We used WGCNA and DEGS analysis to identify significantly up-regulated or down-regulated genes in the context of CAS pathogenesis. Some of these genes were hub genes related to early CAS. The identified genes with detectable expression could serve as potential therapeutic targets and are worthy of further study. Clinical studies have found that statins have anti-atherosclerotic effects. However, it is not clear how statins affect the development of early CAS. By analyzing the effect of statins on hub genes, we obtained statin therapeutic target information regarding the treatment of early CAS.
WGCNA is a systems biology method for describing the pairwise relationships among gene transcripts and is based on entire gene modules rather than focusing on individual genes [8]. Additionally, it uses soft thresholding instead of the standard hard thresholding to split modules and provides a better reflection of the actual biological network. Such an approach is more consistent with biological processes and could contribute to identifying therapeutic targets or candidate biomarkers. As far as we know, our study is the first to apply WGCNA to build an early CAS-related gene network. Through deeply and systematic analysis of the GSE132651 dataset, we determined that the cyan module was significantly related to early CAS disease progression. Then, GO and KEGG analysis was used for annotation and visualization. The result demonstrated that the cyan module was enriched in blood coagulation, hemostasis, platelet activation, muscle tissue development, and developmental cell growth. The KEGG terms were associated with chromosome segregation, negative regulation of organelle organization, and DNA replication were important pathways in the pathogenesis of early CAS. When cholesterol homeostasis changes, DNA may begin to replicate uncontrollably, which then induces chromosome instability in endothelial and smooth muscle cells, destruction of mitotic spindles, and changes in other aspects of microtubule physiology that promote atherosclerosis progression [24]. The cell cycle mechanism may be a potential mechanism for the development of early CAS. In our study, by the integration of hub genes from the cyan module and DEG analysis, the following key genes involved in early CAS were identified: ALCAM, BDNF, CRYL1, DYSF, FGF2, FST, LYN, SH3BP5, TGFB2, and TGFBR2.
Statins are a class of lipid-lowering compounds [6] and regulate a series of processes, leading to reducing the accumulation of esterified cholesterol into macrophages, increasing endothelial NO synthetase, reducing inflammatory process, increasing the stability of atherosclerotic plaques, and restoring platelet activity and the coagulation process [25]. Previous studies have shown that statins can inhibit DNA replication in tumor cells and can suppress cell growth and induce apoptosis by reducing chromosomal instability both in vitro and in vivo [26,27]. To explore possible statin targets in treating early atherosclerosis, we analyzed the previously obtained hub genes in the GSE32547 data-set and found that three genes (BDNF, FST, and TGFB2) up-regulated in the GSE132651 data-set were down-regulated in endothelial cells treated with statins, while the two genes (LYN, SH3BP5) that were previously down-regulated in Abnormal group were up-regulated in those treated with statins.
BDNF is a growth factor produced primarily in the brain but is also distributed in multiple organ systems [28]. In the nervous system, BDNF is very important for neuronal function and survival during development and plays a critical role in neuroplasticity after brain injury [29,30]. Our study found that BDNF was up-regulated in patients with early atherosclerosis. Multiple growth factors and adipokines are identified in atherosclerotic lesions, and neuroimmune mediators such as BDNF may be involved in their development. Studies have found that BDNF expression was significantly reduced in autopsy specimens of patients with CAS, in which neuroimmune pathways play an important role. Whether reduced the levels of BDNF could exert a positive or negative impact on the development or outcome of coronary atherosclerosis were unclear [31]. Down-regulation of BDNF was also found in serum and peripheral blood mononuclear cell of patients with diabetes mellitus-accelerated atherosclerosis (DMAS), but in diabetes mellitus-accelerated atherosclerotic mice, BDNF overexpression promoted M2 macrophage polarization, which repressed the development of DMAS by inactivating the STAT3 pathway [32]. Additionally, we also found that in endothelial cells treated with statins, BDNF was up-regulated (instead of the down-regulation seen in untreated patients). At present, there are few studies about statins’ effects on BDNF in CAS. Zhang et al. found that the serum BDNF levels in an atorvastatin-treated group was significantly higher than those in the controls group; atorvastatin-related elevation in the BDNF levels may promote functional recovery in atherothrombotic stroke [33]. To date, no studies have directly proved the role of BDNF in the treatment of early coronary arteriosclerosis. The potential biological pathways still need to be further investigated and clarified.
FST, also known as follistatin, is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release [34]. The expression level of FST was significantly increased in early CAS, which might contribute to the progression of atherosclerostic. Follistatin is a binding protein of activin A, which plays a biological role by blocking intrinsic activin A and can promote the formation of M phi foam cells by regulating the expression of scavenger receptor mRNA [35]. It has been reported that the activin A-follistatin system plays an important role in the development of atherosclerosis [36]. A study conducted by Wu, et al. [37] study found that the level of follistatin in the blood can be significantly reduced after short-term statin treatment. This was consistent with our conclusion. At present, it has no study directly proved FST was the target of statin therapy in the treatment of early atherosclerosis. TGFB2, or transforming growth factor-β2, is a multifunctional inflammatory cytokine that is produced by many inflammatory cells including leukocytes, macrophages, smooth muscle cells, and platelets [38,39]. TGF-𝛽2, is synthesized as a precursor composed of a signal peptide, a latency-associated peptide (LAP), and a C-terminal fragment [40]. The role of TGF-𝛽 in the pathogenesis of atherosclerosis is widely considered to be a chronic inflammation and has been the subject of controversy for many years. Because it can promote fibrosis and inhibit endothelial regeneration, it is suspected of promoting atherosclerosis [41]. However, with an in-depth analysis of TGF-β, several studies have discovered that TGF-𝛽 regulates many processes to limit the development of atherosclerosis, including lipid accumulation in the vessel wall and the inflammatory response [42]. Some studies have found that statin reduces atherosclerotic plaques and endothelial inflammatory response in atherosclerotic rats through the TGF-β/Smad pathway [43]. Considering that statins can stabilize atherosclerotic plaques and inhibit inflammation, we speculate that statins could prevent the progression of atherosclerosis by regulating TGFB2 expression.
LYN encodes a tyrosine protein kinase that plays a key role in the immune response against pathogenic infection [44]. Lyn is biologically related to macrophage scavenger receptors class-A (MSR-A) and CD40. In the absence of Lyn, peritoneal macrophages inhibit the proliferation of oxidized low-density lipoprotein, and the stimulation of the CD40 pathway failure to induced expression of monocytic chemoattractant protein 1, which is related to atherosclerosis [45]. Our study showed that LYN was significantly under-expressed in endothelial cells of patients with early CAS, while the expression in endothelial cells increased significantly after statin treatment. Dianne, et al. [46] found that simvastatin can significantly reduce lymphocytes’ lipid raft levels, represented by Lyn and Fyn. Lymphocytes have been reported to be involved in the pathogenesis of atherosclerosis, so the effect of statin on lymphocytes may improve the progression of early CAS.
SH3BP5 is a domain-binding protein and preferentially binds to tyrosine kinase (BTK), thus inhibiting BTK function [47]. SH3BP5 may be closely related to atherosclerosis. Some studies have found a close relationship between SH3BP5 and brain neuropeptide as well as the protective effects of nerves and blood vessels [48]. As a kind of bioactive peptide, brain neuropeptide plays an important role in inhibiting cytotoxicity caused by many kinds of injury [49]. In mice with hypercholesterolemic apolipoprotein E-deficient, brain neuropeptides have been shown to protect the function of vascular endothelium and affect atherosclerotic plaques [50]. SH3BP5 is an important factor affecting downstream brain neuropeptides and it also possibly affects the progression of atherosclerotic plaques. Based on current research, SH3BP5 may be a new target for the development and treatment of early CAS. However, further investigation of its biological function is necessary.
In this study, we have identified 10 potential key genes related to the early CAS, and 5 genes that could be used as targets for statin therapy, suggesting that these genes can serve as candidate biomarkers and therapeutic targets. Although the present study is the first to investigate the co-expression gene networks associated with early coronary atherosclerotic using WGCNA analysis, it also has some limitations. Firstly, the sample size included is far from substantial, and need larger sample sizes to confirm our results. Moreover, we did not further study the exact mechanism of the identified hub genes in early coronary atherosclerosis. Finally, further molecular experiments are needed to validate the data in early coronary atherosclerosis. This was the first study to construct a co-expression network to explore early coronary atherosclerosis-associated modules and genes by WGCNA. We identified 10 hub genes that may be regulated the early coronary atherosclerosis, and 5 hub genes that played an important role in statin treatment. These findings enhance our understanding of the molecular mechanisms in early coronary atherosclerosis, although the exact molecular mechanism of hub genes and functional pathways warrants further exploration.
The authors have no conflicts of interest to declare.
Conception and design: M Y, Jq Y; Administrative support: Q H; Collection and assembly of data: Q H; Data analysis and interpretation: Jq Y; Manuscript writing: M Y, Jq Y; Final approval of manuscript: All authors.
This work received funding from Health and Family Planning Commission of Jiangxi Province (No.202130053).
We thanked all people who made contributions to this work. In addition, we wish to thank for WORDVICE (https://wordvice.cn/) providing excellent language editing.
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/ gds/?term=GSE132651 and https://www.ncbi.nlm.nih.gov/ gds/?term=GSE32547.