Analysis of the Significant Genes with Poor Prognosis in Pancreatic Ductal Adenocarcinoma by Integrated Bioinformatics Analysis of the Significant Genes with Poor Prognosis in Pancreatic Ductal Adenocarcinoma by Integrated Bioinformatics.

Purpose: Pancreatic ductal adenocarcinoma (PDAC) is a common gynecological malignancy with complex pathogenesis. The purpose of this study is to find out the important genes of poor prognosis and its potential mechanism. Materials and Methods: The gene expression profiles of GSE62452, GSE41368 and GSE28735 can be obtained from the GEO database. There are 120 PDAC and 112 normal samples in the three profile data sets. GEO2R as well as Venn diagram software were used to screen the differentially expressed genes between PDAC and normal samples. Secondly, use database for visualization, annotation and integrated discovery, and analyze KEGG path and gene ontology. Then we use Cytoscape with search tools to search for interaction genes to visualize the protein-to-protein interaction of these DEGs. Results: A total of 19 genes were consistently expressed in three datasets, of which 3 up-regulated genes and 16 down-regulated genes were rich in biological processes, cellular components and molecular functions. Through the analysis of PPI network by molecular complex detection plug-in, 10 down regulated genes were screened. In addition, in order to analyze the overall survival rate of these genes, Kaplan Meier analysis was conducted, and the prognosis of one gene was significantly poor. In order to verify the effectiveness of gene expression profile analysis, it was found that the expression of ALB gene in PDAC was lower than that in normal tissues. The high expression of ALB gene was related to the lower overall survival rate of patients with PDAC. Conclusion: On the basis of the comprehensive bioinformatics method, ALB is the down regulated DEG in PDAC, with poor prognosis, which may become an important target for diagnosis and treatment of PDAC. Visualization

In the past few decades, the progress of PDAC treatment has been very slow. In 2006-2012, the 5-year survival rate of PDAC was 9% [5]. Therefore, it is important to find new biomarkers to predict the prognosis and improve the survival rate of patients with PDAC.
Gene chip can detect differentially expressed genes quickly, which has been proved to be a reliable technology for more than ten years [6]. In addition, microarray can generate and store many slice data in a common database. Hence, on the basis of these data, a wealth of valuable clues can be unearthed for new research [7]. In recent years, some bioinformatics research have been carried out on PDAC [8,9], which prove that integrated bioinformatics method is helpful for further study and exploration of the potential mechanism of PDAC.
In this study, GSE62452, GSE41368 and GSE28735 were firstly selected from Gene Expression Omnibus (GEO). Secondly, GEO2R searching tool and Venn graph software were used to obtain the common Differential Expression Genes (DEGs) in the above three datasets. Thirdly, the Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to analyze these DEGs, including Biological Process (BP), Cell Composition (CC), Molecular Function (MF), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. Fourthly, a Protein Interaction (PPI) network was established to analyze DEGs and identify some core genes by using cellular MCODE (molecular complex detection). In addition, the core DEG was imported into Kaplan-Meier plotter online database to obtain significant prognostic information (P < 0.05). Moreover, the expression of DEGs between PDAC and normal samples was confirmed by Gene Expression Profiling Interactive Analysis (GEPIA) (P < 0.05). As the result, only 10 DEGs were qualified. Then, we carried out the enrichment analysis of KEGG pathway for these 10 DEGs. Finally, 6 DEGs and 6 genes (CELA3A, CEL, PNLIPRP1, CELA2B, CELA2A and CTRL) were produced, which significantly enriched the pancreatic secretion pathway. In conclusion, our bioinformatics research shows that ALB may be a useful biomarker, which can be used as an effective target and a better prognosis for patients with PDAC. The low expression of ALB indicates that the prognosis of PDAC is better.

Microarray Data Information
NCBI-GEO is considered as a free microarray/gene profile public database. We obtained gene expression profiles of GSE62452,  Table 1.

Data Processing of DEGs
Identify the DEGs between the PDAC sample and the normal sample by GEO2R online tools with |logFC| > 2 and adjust P value < 0.05 [10]. Then, the original data in TXT format is checked online in Venn software to detect the common DEGs in three data sets. DEGs with logFC < 0 are considered as down-regulated genes, and those with logFC > 0 are considered as up-regulated genes.

Gene Ontology and Pathway Enrichment Analysis
Gene Ontology Analysis (GO) is a commonly used method to define genes and their RNAs or protein products to identify the unique biological characteristics of high-throughput transcriptome or genome data [11]. KEGG is a collection of databases, involving genomes, diseases, biological pathways, drugs and chemical materials [12]. DAVID is an online bioinformatics tool designed to recognize the functions of a large number of genes or proteins [13]. We used DAVID to observe the DEGs enrichment of Biogenic

Protein-Protein Interaction Network and Module Analysis
Protein-Protein Interaction Network (PPI) information can be evaluated by an online tool, STRING (search tool for searching interaction genes) [14]. Then, the STRING app in Cytoscape is applied to check the potential correlation between these DEGs (maximum number of interactions = 0 and confidence score ≥ 0.4) [15]. In addition, the MCODE app in Cytoscape is used to check the modules of the PPI network (degree cutoff = 2, max. depth = 100, k-core = 2, node score cutoff = 0.2).

Survival Analysis and RNA Sequencing Expression of Core Genes
Kaplan Meier-plotter is a common website tool used to evaluate the impact of a large number of genes based on EGA, TCGA database and GEO (only limited to Affymetrix microarray) on the Overall Survival rate (OS) [16]. The log rank P value and Hazard Ratio (HR) of 95% confidence interval are calculated and shown on the graph.

24531
RNA sequence expression data of thousands of samples from the GTEx projects and TCGA [17].

Identification of DEGs in PDAC
There were 120 PDAC and 112 normal pancreas specimens in this study. Through GEO2R online tool, we extracted 33, 591 and 57 DEG from GSE62452, GSE41368 and GSE28735, respectively. Then, we used Venn diagram online tool to identify the common DEGs in these three datasets. The results showed that 19 DEGs were detected in PDAC, including 3 up-regulated genes (logFC > 0) and 16 down regulated genes (logFC < 0) ( Table 2 and Figure 1).    (Table 3). The results of KEGG analysis were shown in Table 4. The downregulated DEGs were particularly enriched in pancreatic secretion, fat digestion and absorption, protein digestion and absorption, and glycolipid metabolism (P < 0.05), while the up-regulated DEGs has no obvious signal transduction pathway.

Analysis of Core Genes by the Kaplan Meier Plotter and GEPIA
Kaplan Meier plotter was used to identify the survival data of 10 core genes. The results showed that the survival rate of one gene was significantly higher than that of the other nine genes (P < 0.05, Table 5). Then, GEPIA method was used to detect the expression of ALB gene in cancerous and normal people. Our results showed that ALB overexpression in PDAC patients was associated with more severe OS (P < 0.05, Figure 3A). In addition, the expression of ALB in PDAC samples was lower than that in normal pancreas samples (P < 0.05, Table 5 and Figure 3B).

Table 5:
The prognostic information of the 10 key candidate genes.

Category Genes
Gene with significantly worse OS (P < 0.05) ALB a) The prognostic information of ALB. Kaplan meier plotter online tools were used to identify the prognostic information of the 10 core genes and 1 of 10 genes had a worse significantly OS rate (P < 0.05).
b) ALB was significantly expressed in PDAC cancer patients compared to healthy people. GEPIA website was used to further identify the ALB expression level between PDAC and normal people. The result showed there was significant expression level in PDAC specimen compared to normal specimen (*P < 0.05). Red color means tumor tissues and grey color means normal tissues.

Re-Analysis of 15 Selected Genes via KEGG Pathway Enrichment
In order to understand the possible pathways of the 10 selected DEGs, the enrichment of KEGG pathway was re-analyzed by DAVID.
The results showed that 6 genes (CELA3A, CEL, PNLIPRP1, CELA2B, CELA2A and CTRL) were significantly enriched in the pancreatic secretion pathway (P < 0.05, Table 6 and Figure 4). 4 genes (CELA3A, CELA2B, CELA2A and CTRL) were significantly enriched in fat digestion and absorption pathway, and 3 genes (CEL, CLPS and PNLIPRP1) were significantly enriched in protein digestion and absorption pathway (P < 0.05, Table 6).  plotter to analyze and found that 1 of 10 genes had a significant better survival. Then we used GEPIA analysis and proved that the expression of the gene in PDAC samples was lower than that in normal samples (P < 0.05). ALB is the main protein in human plasma, which is also considered as an important indicator of nutritional status and a powerful predictor of poor prognosis in patients undergoing major surgery [18]. ALB has been considered as an endogenous antioxidant, which plays a role in many physiological and pathological processes as well as exerting anti-carcinogenic effects [19]. The lower serum ALB level in cancer patients may be due to the persistent systemic inflammatory response of aggressive metabolically active tumors. A population-based prospective study showed that a higher level of ALB was associated with a lower risk of breast cancer [20]. Previous studies [21][22][23] reported that low ALB level was a reliable risk factor for poor prognosis of pancreatic cancer. It might emphasize the importance of metabolic changes in the natural history of pancreatic cancer [24]. In addition, some studies have identified clinic pathological prognostic factors associated with serum ALB levels in PDAC patients, including C-Reactive Protein (CRP)-to-albumin ratio [25] and Modified Glasgow Prognostic Score (mGPS) [26].
However, no study has reported that ALB gene in tumor tissue predicts survival in patients with PDAC. In this study, ALB gene was down regulated in PDAC compared with normal pancreas. This might be the reason that only hepatocytes and HCC express ALB mRNA under normal conditions [24,25]. It is worth noting that our study shows that in patients with PDAC, the high expression of ALB and poor prognosis are statistically significant. Whether this is related to the progress of PDAC is uncertain. More research is needed in this area. Taken above, bioinformatics analysis of three sets of PDAC microarray data showed that the high expression of ALB in PDAC tissue was related to unsatisfactory survival effect.
It may provide useful information for the study of potential biomarkers and biological mechanism of PDAC. However, the study of molecular mechanism and biological function of ALB gene, and whether ALB gene can be used as a new potential biomarker or therapeutic target for PC patients need further study.