The Key Pathways and Born Genes of Bronchopulmonary Dysplasia

The Key Pathways and Born Genes of Bronchopulmonary Dysplasia. In order to identify the potential therapeutic target genes of Bronchopulmonary dysplasia (BPD). Methods: GSE25293 identified the hub gene of the core module by weighted co-expression network analysis (WGCNA); the GSE25286 microarray data screened genes with the same trend through the Series Test of Cluster (STC), and used the limma algorithm to screen differentially expressed genes (DEGs), and the difference gene was analyzed with the target gene. Results: Autophagy, EMT, cell cycle, proliferation, VEGF and Notch pathway, and found the potential 8 target gene of BPDs. Conclusion: BPD-related mechanisms and hub genes, speculated that Prokr2 / VEGF pathway, IFIT1 or IFIT3 / EMT pathway and Agt / VEGF pathway may be important pathways for BPD.


Introduction
Bronchopulmonary dysplasia (BPD), also known as Chronic lung disease (CLD), is a chronic lung disease caused by a combination of various factors, impaired alveolar and pulmonary vascular development [1]. In preterm infants with gestational age <32 weeks, the incidence of BPD was 12% to 32% [2], and ultralow birth weight infants (ELBWI) (birth weight <1000g) and very early preterm infants (EPI) (fetal age <28 weeks) BPD incidence rate of up to 50% [3]. In recent years, due to various environmental and social factors and prolonged oxygen exposure, prenatal glucocorticoid application, continuous improvement of mechanical ventilation technology etc., especially the widespread application of Pulmonary surfactant (PS), neonatal classic "BPD" "The new type of "BPD" has been transformed, and the survival rate of premature infants has increased. At the same time, the incidence of BPD has increased, which seriously affects the quality of life and long-term prognosis of preterm infants. Currently, there is no effective method for preventing or treating BPD in clinical practice [4]. The new BPD has become a problem that can't be ignored in neonatal intensive care unit, which seriously affects the prognosis and quality of life of children.
It is currently considered that BPD is a multifactorial disease.
Severe BPD mortality rate of up to 25%(2019), survival children had a higher risk of diseases such as retinopathy of premature infants and high airway response, and the incidence of cognitive, educational, and behavioral disorders in children with BPD were also significantly higher [5]. Therefore, BPD has become one of the leading causes of disease and death in premature infants, and research on the pathogenesis, prevention and treatment of BPD has become a hot topic in the field of neonatal care. It is urgent to study its pathogenesis, prevention and treatment. With the development of genomics and epigenomics, modern medicine has entered the era of precision medicine. Through the bioinformatics analysis of BPD, many molecular markers related to the occurrence, metastasis, drug resistance and prognosis of BPD were found. Further research based on genomic big data is of great significance for finding the target of diagnosis, treatment and prognosis of BPD.
Currently, microarray technology has become an indispensable tool for detecting genomic expression levels in specific organisms.
Gene Expression Omnibus (GEO) was created in 2000 as a global resource for international public repository and gene expression research. It can upload and download genomic data sets acquired by microarrays, next-generation sequencing and other forms of high-throughput functions [6]. The advantage of integrated bioinformatics analysis is that it can overcome sample heterogeneity and platform variability and integrate data from different independent microarray studies to obtain more clinical samples for more reliable analysis. This study screened the differential published gene (DEG) using the limma differential screening algorithm and standardized methods in the GSE25286 microarray data downloaded from GEO. Weighted Gene Co-Expression Network Analysis (WGCNA) can be used to explore the relationship between genes and phenotypes, which can be analyzed by transforming gene expression data into co-expression modules.
The WGCNA method can be used to mine specific disease-related modules and genes associated with specific traits, and to further discover related genes and predict gene function by analyzing hub genes.
WGCNA is widely used in a variety of biological process analysis to deal with differentially expressed genes and gene interactions to identify therapeutic targets or candidate biomarkers. This study is based on the BPD gene expression microarray data GSE25293, which uses WGCNA to construct a co-expression module to screen hub genes for specific modules. The Series Test of Cluster (STC) is the discovery of the trend characteristics of gene expression, and the genes of the same change characteristics are concentrated in a trend, so as to find the most representative gene group in the process of experimental changes, revealing that the biological sample is in the process of change. In this study, a trend analysis was performed on the BPD gene expression microarray data GSE25293, and genes that met the trend of change were screened for further analysis.
Finally, we screened the differential genes generated by WGCNA and time series trend analysis, and analyzed the correlation between the genes whose expression trends increased or decreased, and the related genes of the traits we were concerned with and analyzed the gene interaction network to find the hub gene.

Microarray Data
The GSE25293 microarray data contributed by Jie et al. was selected in the GEO database (http://www.ncbi.nlm.nih.gov/ geo/) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) and GSE25286 microarray data (https://www.ncbi.nlm.nih.gov/geo/ query/acc.cgi?acc=GSE25286), both data BPD were created using neonatal mouse hyperoxic mode. The model, which was exposed to hyperoxia for 14 days and 29 days, was subjected to mRNA and miRNA microarray analysis of gene expression profiles, and was based on the GPL1261 platform (Affymetrix Mouse Genome

Weighted Gene Co-Expression Network Analysis (WGC-NA)
A gene co-expression network was constructed for GSE25293 microarray data using the WGCNA package in R. To create a network with a near-scaled topology, we installed a soft threshold power of β = 4 (scale free R2 = 0.88). The adjacency matrix was calculated and converted into a topological overlap matrix (TOM).
Gene module was detected using a dynamic tree cutting algorithm.

Series Test of Cluster(STC)
Trend analysis (STC) of the GSE25286 data set, when the organism is in a certain order or stimulated by the external environment, the changes in gene expression had different trend characteristics. These genes with the same changing characteristics were clustered in a trend graph. The purpose of STC analysis was to find the trend characteristics of gene expression and to find the most representative gene group during the sample sequence change, with P < 0.05 as the threshold. The STC algorithm of gene expression kinetics was used to profile the gene expression time series to determine the most likely cluster set. The method explicitly considers the dynamic characteristics of gene expression profiles during clustering and identifies multiple different clusters.
Differentially expressed genes were selected according to RVM corrected analysis of variance (ANOVA). Based on the trend of gene signal density under different conditions, we identified a unique set of expression trends. The original expression value was converted to a log2 ratio. Using a clustering strategy for timeseries gene expression data, we defined a unique spectrum. The expression profile was related to the actual or expected number of genes assigned to each profile. Significant profiles had a higher probability than Fisher's exact test and multiple comparison tests.

Differential Expression Gene Analysis
The gene expression matrix of the GSE25286 microarray data was downloaded from GEO and screened for DEGs using the limma package in R 3.4.1. The DEG was down-regulated and the DEG was down-regulated according to the expression of DEG. Where |log2FC (fold change) |>0 and P<0.05 were used as interception criteria for screening of DEGs.

Target Gene Association Analysis
We have collected a wide range of target genes that have been extensively studied and studied by their mechanisms of action, and classified them as apoptosis, Epithelial interstitial transformation (EMT), progression, G0_G1 (Cell cycle-chromatin, protein replication) and G2_M (Cell Cycle-division) and Immune microenvironment. For each type, we calculated the correlation between the studied gene and the target gene based on the expression profile data and performed statistical tests. A scatter plot was drawn for each group of statistically significant genes, and all correlations under a certain classification were shown in heat maps.

Gene Interaction Map
Firstly, in the STRING online database (http://string-db. org), the difference genes between WGCNA and time series trend analysis and the differential analysis of the expression trend of the rise or fall of the DEGs were analyzed by gene interaction network, and then according to KEGG The inter-genetic relationship between the genes recorded in the database was used to generate a table of inter-genetic signal transduction network relationships using GeneAct software. Based on the signal transduction relationship of genes, a gene interaction network was constructed. Each node represents a protein or gene or other molecule, the edge between nodes represents their interaction, and degree represents their number of connections.

WGCNA Analysis
A total of 45,101 genes expression profiles were obtained by data analysis, and the low expression genes were filtered to obtain 21,815 highly expressed genes. The sample clustering and trait association of gene expression levels were calculated. The results of Figure 1A indicated that the gene clustering tree of each tissue can correspond well with the tissue. By filtering the weight values, finally selecting β=4 to construct the network, using the dynamic cut tree method to merged similar modules, and obtaining 9 coexpression modules, different colors represent different modules, and Grey modules were a group of unallocated. Gene collection to other modules. The number of genes in the module was clustered according to the expression level, and the genes with higher clustering degree were assigned to a module, and clustering analysis was performed according to the inter-gene expression level, and the correlation between the modules is obtained.
The graph, whose dendritic diagram showed a module that was significantly associated with the clinical phenotype of BPD ( Figure   1B), randomly selected 1000 genes in the heatmap ( Figure 1C).
Eigengene dendrogram and heatmap were used to identify related eigengenes. By observing the absolute value between the module and the sample time characteristic of Figure 1D, the absolute values of the module and turquoise modules and sample time characteristics were as high as 0.86 and 0.97, of which the brown module had a negative correlation trends; the turquoise module had a positive correlation trend, and selected these two modules were tissue specific modules. Tissue-specific modules may be involved in tissue-specific biological processes. To validate the reliability of co-expression network construction and tissue-specific modules, these two specific modules were analyzed.

Series Test of Cluster
The gene expression dynamics STC algorithm was used to analyze the genes of the GSE25286 microarray data (p < 0.05) and to determine the most likely profile in the time series. 21,815 differentially expressed genes were placed into eight expression pattern spectra using STC analysis (Figure 2). The horizontal axis  (B) The eigengene dendrogram and heatmap identify groups of correlated eigengenes termed meta-modules. The dendrogram indicated that the Turquoise module and Brown module were highly related to the BPD, among them, brown module was negatively correlated. Turquoise module has a positive correlation trend. The heatmap in the panel showed the eigengene adjacency.
(C) Visualizing 1,000 random genes from the network using a heatmap plot to depict the TOM among the genes in the analysis. The depth of the red color was positively correlated with the strength of the correlation between the pairs of modules on a linear scale. The gene dendrogram and module assignment were shown along the left side and the top.
(D) Module-trait relationships. Each row corresponds to a module eigengene, each column corresponded to a trait(WT,P14_ BPD and P29_BPD), and each cell consists of the corresponding correlation and P-value, which were color-coded by correlated according to the color legend. Among them, the Brown and Turquoise modules were the most relevant modules to the BPD mice models. Therefore, the genes in profile 1, profile 6, profile 7, and profile 0 were confirmed to be more meaningful and valuable.

Differentially Expressed Gene
The microarray GSE25286 data set sample BPD was modeled in neonatal mouse hyperoxia mode, including 2 controls, 3 hyperbaric treatments for 14 days, and 3 hyperoxic treatments for 28 days. The differentially expressed genes between the groups were calculated by the R software limma package. The upper and lower DEGs were screened by |log2FC |> 1 and FDR < 0.05. The results are shown in Table 1.

Target Gene Association Analysis
To further guide the relevant studies, we collected apoptosis, epithelial interstitial transformation (EMT), proliferation, G0_G1 (Cell cycle-chromatin, protein replication), G2_M (Cell cycle-division), vascular endothelial growth factor (VEGF), Notch pathway and Immune microenvironment related genes, and classified according to different functions and periods of action.
Correlation analysis was performed on the genes of interest and the target genes of different classifications to further speculate on the potential function of the differential genes. From Figure 4, we can visually see the correlation between the gene of interest and the target gene of this type of function (or period) and speculate on its possible mechanism of action. Figure 4: Figure 4A-G were the correlation analysis between the intersection gene and the target genes of apoptosis, EMT, GO-G1, G2-M, proliferation, VEGF pathway and Notch pathway, in which the upper Figure 4A-E were the up-regulated gene and the lower was the down-regulated gene; Figure 4F-G left The side was the up-regulated gene and the right side was the down-regulated gene. In the correlation graph, the left side was the pathway gene of the group, the top was the research gene, the vertical axis was the expression value of the target gene, and the horizontal axis was the expression value of the gene of interest, and each point indicated the correlation between the two genes. The color of the point represented the correlation coefficient size, the legend on the right.
In the correlation analysis with the target gene, it was found that the correlation between the up-regulated cross-over gene and the

Gene Interaction Network Map and Hub Gene
The difference analysis overlapped the up-regulated genes and the intersection down-regulated genes interaction analysis, and we screened the results with a correlation score greater than 0.9. In order to more intuitively observe the interaction between the genes of interest and understand the mechanism of action in different situations, we used GeneAct software to map the inter-

Discussion
In this study, the GSE25293 microarray data from GEO was analyzed by weighted gene co-expression network. The gene expression profile of the microarray data was divided into 9 modules, and the correlation analysis between modules and BPD clinical target traits was carried out. Screen out the key modules of the brown module and the turquoise module and the hub genes in the module. At the same time, the GSE25286 data set was analyzed for STC and gene expression differences. The above three methods were used to screen the differentially expressed genes for up-regulation and down-regulation, and 24 were obtained.
The crossover genes were up-regulated, and 20 cross-over genes were down-regulated. In order to further speculate the potential function of differentially intersecting genes, the correlation between differentially intersecting genes and target genes was analyzed, and it was speculated that apoptosis, EMT, proliferation, However, studies have found that in U937 cells, ectopic expression of lfit3 leaded to increased accumulation of cells in G(1)/S transition and growth arrest [18]. Interferon induced protein with tetratricopeptide repeats 1 (Ifit1) encoded a protein containing a tetrapeptide repeat, a member of the interferoninducible protein family, with a tetrapeptide repeat (lfits) possibly involved in inhibition of viral replication and translation initiation [19]. Its product p56 protein was involved in inhibiting translation initiation, antiviral action and inhibiting cell migration and antiproliferative effects [20]. Importantly, lfit1 contained multiple repeats of the four-reverse transcript repeating helix-turn-helix motif, which readily bound to various proteins and mediated protein-protein interactions, involving translation initiation, double-stranded RNA signaling, and cell Migration and proliferation [21]. Studies have shown that lfit1 or lfit3 overexpression may mediate EMT through EGFR activation, and EMT played an important role in BPD progression [22]. The protein encoded by P2Y receptor family member 10 (P2ry10) belongs to the G protein coupled receptor family, and the G protein coupled receptor was preferentially activated by adenosine and uridine nucleotides.
Masanori et al. found that the P2Y (10)  AGT overexpression leaded to inflammatory response through JAK/ STAT signaling pathway, which ultimately inhibited proliferation of A549 cells and induced BPD [25].
Abdelwahab et al. found that AGT and ANGII were involved in bleomycin-induced Bleo-induced lung injury in neonatal lungs and suggested that angiotensin system may be a potential target for the treatment of neonatal lung injury [26]. Prokineticins was a secreted protein that promotes angiogenesis and induces contraction of the gastrointestinal smooth muscle. Prokr2 encoded a membrane protein and a G protein-coupled prokineticins receptor. The study found that Prokr2 mutations were associated with hypopituitarism during embryonic pituitary development, which in turn affected neonatal development [27]. Chauvet et al. found that prokineticin receptor 2 (Prokr2) was an important G protein-coupled receptor that bound to angiogenic factors such as endocrine gland vascular endothelial growth factor (EG-VEGF) and mammalian Bv8 [28].

Conclusion
In summary, biological processes such as apoptosis, EMT, cell cycle, proliferation, VEGF pathway and Notch pathway may be involved in the occurrence of BPD, and lfit3, lfit1, P2ry10, P2ry14, Agt, Stfa2l1, Prokr2 And 8 genes of Stfa3 may be potential targets for the treatment of BPD. However, due to experimental conditions, the relationship between gene expression profiles and more individual agents remains to be further studied.