Hui Liu1, Xin Wang3 and Xin Li2,3*
Received: December 17, 2025; Published: December 22, 2025
*Corresponding author: Xin Li, Department of Infection and Immunodeficiency, Beijing Ditan Hospital, Capital Medical University, Peking University Ditan Teaching Hospital, Beijing 10015, China
DOI: 10.26717/BJSTR.2025.64.010022
Background: Autoimmune hepatitis(AIH) is a chronic progressive inflammatory disease of the liver. Although
several genetic factors have been reported in the pathogenesis of AIH, many details remain unknown.
Methods: GSE140249 and GSE159676 datasets were downloaded from the GEO dataset. The Weighted Gene
Co-expression Network Analysis (WGCNA) and LASSO algorithm were used to identify the miRNAs closely related
to the occurrence of AIH. Both miRBase and miRWalk databases were used to predict potential mRNAs
regulated by AIH related miRNAs, and the intersection was regarded as candidate genes.
Results: A total of 470 differentially expressed(DE) miRNAs were identified in GSE140249. The brown module
of WGCNA was considered as the most critical module related to the occurrence of AIH, containing 100 miRNAs.
The LASSO further identified thirteen miRNAs. Combined with the DE mRNAs in GSE159676, fourteen mRNAs
were identified. Then, the ten candidate genes were found to be enriched in gene ontology (GO) database, mainly
in biological process and cell components. In addition, seven pivotal genes, DYNC1H1, ALG1, PTPRB, SMAD2,
MICA, GP2 and SPN were identified, and their otherness was confirmed in rat AIH model.
Conclusions: The integrative analysis firstly revealed that miRNA-mRNA might be a crucial susceptible factor
for AIH and identified that DYNC1H1, ALG1, PTPRB, SMAD2, MICA and GP2 could be used as potential therapeutic
targets and diagnostic biomarker.
Keywords: Autoimmune Hepatitis; Diagnostic Biomarker; WGCNA; LASSO; Bioinformatics Analysis
Abbreviations: AIH: Autoimmune Hepatitis; WGCNA: The Weighted Gene Co-expression Network Analysis; DE: Differentially Expressed; GO: Gene Ontology; GEO: Gene Expression Omnibus; LASSO: Least Absolute Shrinkage and Selection Operator; DEGs: Differentially Expressed Genes; FC: Fold Change; MM: Module Membership; GS: Gene Significance; MF: Molecular Function; BP: Biological Process; AST: Aspartate Transaminase; CC: Cellular Component; SD: Standard Deviation; EMT: Epithelial-Mesenchymal Transition; LIHC: Liver Hepatocellular Carcinoma; CDG: Congenital Disorders Of Glycosylation
Autoimmune Hepatitis (AIH) is a world public health problem with a high risk of progression to liver cirrhosis [1]. It is mainly characterized by elevated autoantibodies, hyper gamma-globulinaemia and interface hepatitis [2]. AIH imposes a major health burden globally, owing to its high incidence in developed countries and the increasing prevalence in developing countries [3]. The pathophysiology of AIH is multifaceted and not completely clear. Environmental factors, genetic susceptibility, molecular mimicry and dysfunctional immune responses have been suggested to contribute to AIH pathogenesis [4]. Generally speaking, genetic susceptibility is considered to be the most common pathogenic factor leading to AIH [5]. Furthermore, some studies have been conducted in a large sample to explore new and existing AIH gene biomarkers [6]. A number of different methodologies are used to identify AIH related candidates, ranging from single-gene studies to genomics analyses [7]. It is worth considering that new methodologies are developed to explore more AIH biomarkers, which will be helpful to understand the biology of AIH by using genetic, epigenetic, and transcriptome investigations. Another area of concern is that AIH is often found in the stage of cirrhosis, missed the optimal treatment opportunity. Most patients with AIH respond well to standard immunosuppressive therapy with steroids and azathioprine, while about 10%-20% of patients have insufficient response to standard treatment or have intolerable side effects [8,9].
Once diagnosed with the end stage of liver cirrhosis, liver transplantation is regarded as the final treatment scheme [10]. There is an urgent need to identify new biomarkers in patients with AIH to better detect people at early stage of AIH, so that effective therapies can be used to curb disease progression. The bioinformatic analysis of miRNA/mRNA sequences draws increasing attention in non-tumor disease research recently [11]. A series of online databases and tools have contributed a lot to understanding and predicting disease development. The Gene Expression Omnibus (GEO) database freely distribute high-throughput molecular abundance data [12]. The miRbase database is a searchable database of published miRNA sequences and annotation [13]. And the miRWalk database stores experimentally verified miRNA-target interactions [14]. These databases provide access to publicly available data for researchers. Weighted Gene Co-expression Network Analysis (WGCNA), as a method to screen disease-related modules, is the most common and useful method for discovering the link between key genes and clinical characteristics [15]. The least absolute shrinkage and selection operator (LASSO) is another algorithm to obtain a more refined model by constructing a penalty function [16]. However, there are still little researches on AIH. Here, the key genes are screened by differential expression miRNAs analysis, and WGCNA combined with LASSO, in order to explore the key genes associated with AIH.
These miRNA target mRNAs are intersected with the differentially expression mRNA in another dataset. For the obtained mRNAs, it is further validated the key genes in the rat AIH model, so as to explore the critical genes and provide candidate markers for the diagnosis of AIH.
Datasets
GSE140249 (platform: GPL21263) [17] and GSE159676 (platform: GPL6244) [18] were downloaded by “GEOquery” R package from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih. gov/geo/). The dataset GSE140249 contains 49 AIH human serum samples and 27 normal human serum samples, which tested the microarray through non-coding RNA profiling. The dataset GSE159676 contains 3 AIH liver tissue samples and 6 normal human liver tissue samples, which contains the mRNA array using the Affymetrix Human Gene 1.0 st. It is worth mentioning that we followed the methods of Jianbo Qing, et al. 2022.
Differentially Expressed Genes Analysis
Differentially expressed genes (DEGs) were identified by “edgeR” R package in R Studio software (Version: R 3.5.1). It is considered DEGs as FDR <0.05 and |log2Fold Change (FC)| > 1.5, in which log- 2FC > 1.5 thinks that the expression of gene is up-regulated, and log- 2FC<-1.5 thinks that the expression of gene is down-regulated. DEGs were visualized as volcano and heatmap by the “ggplot2” package in R Studio software.
Weighted Gene Co-Expression Network Analysis
Weighted Gene Co-Expression Network Analysis (WGCNA) is a well-established method for analyzing the relationship between diseases phenotype and genes. The dataset GSE140249, which had a large sample size, was chosen to find key genes related AIH phenotype by WGCNA in R studio. The cluster diagram was drawn to select the appropriate threshold to eliminate outliers. A scale-free network was constructed by calculating different power value (range from 1 to 20), and the suitable power value was chosen when scale free R2 was 0.8. Subsequently, we performed the cluster analysis to recognize the highly similar modules and the min module size set as 30. The connection between modules and traits were calculated to identify the key trait-module genes. In the module-trait analysis, the module genes were shown with module membership (MM) and gene significance (GS), so as to analysis the significance of high connectivity genes in modules.
miRNA Prediction
miRNA-mRNA target regulation network was predicted by miRbase (https://www.mirbase.org/) and miRwalk (http://mirwalk. umm.uni-heidelberg.de/) database. The miRNAs of interation in databases above were as the predicted miRNA targeted mRNA. The miRNA-gene network was visualized by Cytoscape software (Version: 3.9.1).
Functional Enrichment Analysis
Online tool DAVID (https://david.ncifcrf.gov/) was used to perform the Gene Ontology (GO) with the threshold of P-value <0.05 [19]. GO analysis including molecular function (MF), biological process (BP) and Cellular Component (CC). The records of GO analysis were visualized as chord chart by “ggplot2” R package.
Animal Model of Autoimmune Hepatitis
SD Rats were acquired by the Spelfor (Beijing) biotechnology company. 5-6 rats were group housed in a temperature- and humidity- controlled, specific-pathogen-free animal facility at 25°C under a 12 h–12 h light–dark cycle with free access to food and water. For the AIH study, rats (aged 6-8 weeks) were injected with ConA (16mg/kg) through tail vein injection to establish AIH model. Rat in the control group received an equal amount of 0.9% normal saline. Rats anesthetized with pentobarbital sodium were sacrificed after 16-24h treatment, the blood and liver tissues were collected. Serum alanine aminotransferase (ALT) and aspartate transaminase (AST) levels were measured using ALT/AST detection kit (Applygen, E2021/E2023). The liver tissues were obtained for hematoxylin and eosin (H&E) staining and RT-qPCR analysis.
RNA Extraction and RT-qPCR
RNA was extracted from the liver tissues using TRIzol reagent (Dakewe Biotech Co., 8034011) and reverse-transcribed to cDNA using the Takara PrimeScriptTM RT reagent Kit (Takara, RR037A). Primer sequences were obtained from PrimerBank (https://pga.mgh. harvard.edu/ primerbank/) and synthesized at SYNBIO Technologies. Ltd. (Suzhou, China) (Table 1). RT-qPCR was performed using the Applied Biosystems™ PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific, A25776) on an ABI Applied Biosystems. RT-qPCR amplification was conducted in triplicate for each sample, and the expression of target genes was normalized to GAPDH. Relative expression was determined using the 2-△△ Ct method. Each experiment was repeated more than three times.
Statistical Analysis
The R Studio (version 3.5.1) was used to generate images. Graph- Pad Prism 8.0 was used to perform statistical analysis and drawing. Unpair t-test was selected as the statistical method. P-value < 0.05 was considered as statistically significant. Data was presented as mean ± standard deviation (SD).
Identification of Differentially Expressed miRNA Between AIH and Healthy Control
The study whole analysis process is presented in Figure 1. GSE140249 was analyzed by “edgeR” package in R with FDR <0.05 and |log2FC| > 1.5 as the threshold. In dataset GSE140249, 470 differentially expressed miRNAs (DE miRNAs) were identified, which contained 249 up-regulated genes and 221 down-regulated genes (Figure 2A & 2B). All the DE miRNAs in the datasets were respectively performed clustering and shown in the form of heatmaps (Figure 2A). The top 10 DE miRNAs in the dataset were shown in Table 2.
Construction of Co-expression Module and Identification of Key Trait Module miRNAs
WGCNA was performed to excavate key genes significantly related to traits by combining with DEGs. 25 sample was regarded as outlier removed in dataset GSE140249 under the cut height of 12000 (Figure 3A). Sample clustering tree and phenotypic heatmap were shown in Figure 3B. The darker the color, the higher the expression in phenotypic heatmap, and the grey module represented missing values. The top 75% genes of the median absolute deviation in this dataset were screened for WGCNA. When the soft-thresholding power (β) was equal to 5, the evaluation parameter R2 was greater than 0.85 (Figures 3C & D). A scale-free network was constructed with the soft-thresholding power of β = 5 and the genes were classified into different modules according to expression profiles. A total of 4 consensus co-expression modules were identified via the hierarchical clustering (Figure 3E). The genes which could not been classified into modules belong to grey module. 100 miRNAs were extracted in brown module, which had the most significant gene significance (GS) in AIH (Figure 3F). The scatter diagram was drawn to reveal the relationship between GS and module membership (MM) (Figures 3G, cor = 0.87, P = 6.2e-31). Above results show that brown module was highly correlated with the occurrence of AIH, and 100 miRNAs contained in which were considered as the trait-module key miRNAs.
Construction of the miRNA Signature Based on LASSO Regression
Among the total of 100 miRNAs, LASSO regression analysis was performed for further filtration. The regression coefficient diagrams were present including Lasso coefficient profiles and cross-validation parameters. Coefficient profiles decreased as the lambda value decreased (Figure 4A & B). The best lambda was 0.049. In total, 13 miRNAs were retained including hsa-miR-122-5p, hsa-miR-154-3p, hsa-miR-188-5p, hsa-miR-302c-5p, hsa-miR-515-3p, hsa-miR-602, hsa-miR-675-3p, hsa-miR-4772-5p, hsa-miR-5587-5p, hsa-miR- 6836-5p, hsa-miR-7152-3p, hsa-miR-7854-3p, hsa-miR-513c-5p. The miRNA–gene interaction table was downloaded using the miRbase and miRwalk database. Two miRNA–gene interaction networks were constructed by Cytoscape software (Figure 4C & D). GSE159676 was analyzed by “edgeR” package in R with FDR <0.05 and |log2FC| > 1.5 as the threshold. In dataset GSE159676, 383 DEGs were identified, which contained 1 up-regulated genes and 382 down-regulated genes (Figure 5A). Based on the miRbase and miRwalk database, the miRNA–mRNA interaction was constructed. It was found that a total of 14 DEGs were screened out in miRbase, miRwalk and GSE159676 dataset, which was shown by Venn diagram in Figure 5B. In order to further learn the function of AIH-related genes, 10 differential mRNAs (DE mRNAs) were found to be enriched in gene ontology (GO) database, mainly in biological process (BP) and cell components (CC), including axon cytoplasm, cell surface, integral component of membrane and anterior/posterior pattern specification (Figure 5C).
Validation of the Key mRNA in Rat AIH Model
To further validate the role of the 10 DE mRNAs in AIH, we prepared the Con A-induced rat AIH model. Some studies have reported that Con A could induce AIH by activating immune cells, mainly T lymphocyte. H&E staining analysis revealed that Con A (16mg/kg) induced rat displayed infiltration of inflammatory cells and interface necrosis (Figure 6A). Compared to the control group, the Con A group had the higher level of ALT and AST (Figure 6B & C). As expected, qPCR showed that DYNC1H1, ALG1, PTPRB, SMAD2, MICA and GP2 were lowly expressed in Con A induced rat AIH model, SPN was highly expressed, while there was no significance in TRAM2, AP3M2 and HOXA3 (Figure 6D).
There are few biomarkers related to AIH, and the diagnosis of AIH mainly depends on the change of autoantibodies and invasive tests. In this study, comprehensive bioinformatics methods were used to mine AIH-related databases, including WGCNA, LASSO algorithm and the prediction of target genes. A lot of mRNAs related to AIH were screened by the bioinformatics analysis, and which was further validated in the rat AIH model. DYNC1H1, ALG1, PTPRB, SMAD2, MICA and GP2 were lowly expressed in AIH, while SPN was highly expressed in AIH. It could provide a solid foundation for the following researches. There are some studies on the identified mRNAs. DYNC1H1 (Dynein cytoplasmic 1 heavy chain 1, DYNC1H1) was located in several cellular components, including manchette, neuronal cell body and nuclear envelope [20]. Studies reported that DYNC1H1 was associated with liver hepatocellular carcinoma (LIHC), and which might play an important role in the progression of LIHC through epithelial- mesenchymal transition (EMT) and immune response [21]. ALG1 (chitobiosyldiphosphodolichol beta-mannosyltransferase, ALG1) is an essential mannosyltransferase involved in N-glycosylation [22,23]. Many studies have found that new glycosylation-related gene mutations could lead to diseases. Glycoprotein and glycolipid synthesis are one of the main functions of the liver. Congenital disorders of glycosylation (CDG) could affect the structure and function of liver and gallbladder, which can lead to fatty liver, liver fibrosis and abnormal bile duct development [24].
PTPRB (protein tyrosine phosphatase, receptor type, B, PTPRB) was predicted to be involved in angiogenesis and inactivation of protein kinase activity [25]. It was reported that miR-665 could targeted PTPRB promoting lung cancer and tumor cell migration, invasion, and proliferation [26]. SMAD2(SMAD family member 2, SMAD2) was involved in regulation of gene expression, and transforming growth factor beta receptor signaling pathway [27]. It has been reported that SMAD2 was closely related to a variety of liver diseases, such as liver fibrosis, liver cancer, primary biliary cholangitis (PBC), etc [28]. Another study constructed the adult mice of hepatocyte-specific deletion of Smad2, and showed that SMAD2 could inhibit the growth and dedifferentiation of hepatocytes [29]. MICA (Major Histocompatibility Complex Class I-Related Chain A) was associated with various liver disease, such as nonalcoholic fatty liver disease, liver fibrosis and hepatocellular carcinoma [30-31]. Glycoprotein 2 (GP2) is a sensitive marker in primary sclerosing cholangitis, which could provide strong evidence for AIH [32]. SPN (Sialophorin, SPN) was closely related to the cellular immunity, including T cell receptor signaling pathway, monocyte activation, and thymocyte aggregation [33]. The above analysis results show that AIH-related mRNA predicted by bioinformatics has been studied in various liver diseases, but there is no AIH-related research. It is worthy to be further verified the role of these mRNAs in the occurrence and development of AIH.
Immune-mediated hepatocyte injury is the main pathogenesis of AIH. Recent studies reported that abnormal immune tolerance was closely related to AIH, especially in T lymphocyte [34]. At present, the therapeutic targets for AIH mainly focus on immunity, such as glucocorticoid and azathioprine [35], but new targets are urgently needed for patients who are respond or tolerate. Therefore, future research needs to reveal more genetic susceptibility sites of AIH from three different perspectives: genetics, immunity and environment, so as to provide more effective means for accurate clinical intervention [36]. Bioinformatics is a multi-disciplinary frontier cross-discipline, which has the advantages of massive and diversified data acquisition and analysis. It has greatly impacted the traditional mode of medical research activities, and can provide a direction for developing new therapeutic targets [37,38].
In conclusion, this integrative analysis indicated that comprehensive bioinformatic analysis could provide a recommendable way to identify key genes related to nontumor disease [39]. Seven key genes correlated with AIH were identified, which also was validated in rat AIH model. DYNC1H1, ALG1, PTPRB, SMAD2, MICA, GP2 and SPN may play key roles in the occurrence and development of AIH.
Ethics Approval and Consent to Participate
All methods were carried out in accordance with relevant guidelines and regulations.
Consent for Publication
Not applicable to this research.
Conflicts Of Interest
The authors declare that they have no competing interests.
Data Availability
The data used in this research were obtained from the GEO (https://www.ncbi.nlm.nih.gov/geo/) database (GSE140249 and GSE159676), all of which are available in publicly available databases. This study complies with its data use and publication rules.
