ABSTRACT
Background: breast cancer is classified in three subtypes based on molecular
and clinical parameters: luminal type, HER2 and basal type. Luminal type carcinomas
(the most frequent, approximately 65% of the total) are characterized by a particular
biological behavior, with a recovery from the disease in 40-50% of cases and death in
about 75% of these 5 years after diagnosis, despite an initial apparent low aggressiveness.
Further biomolecular parameters are needed for this category, which, starting from an
estimated cure of the disease, can help us guide therapeutic choices by modeling them
on actual needs.
Methods: the aim of this work is to build a panel of genes that can allow us to
personalize therapy and hopefully reduce mortality. The kit we patented includes about
33 genes that better characterize the heterogeneity of the neoplasm and its sensitivity
to drugs. The study was based on the sequencing of the total transcriptome (RNA) of 40
patient samples collected in the two-year period 1994/1995. The aim was to identify a
group of genes that are particularly differentially expressed in patients with a good and
those with a poor prognosis.
Results: RNA was extracted from formalin-fixed and paraffin-embedded samples
and then prepared the library for RNA-Seq. The study revealed some genes such as the
life CXCL13, the IFITM10 death gene, present regardless of the molecular subtype, and
the DSCAM-AS1 gene, specific for the Luminal subtype A. These genes, if present, allow
the patient to avoid standard PBI and therapy neoadjuvant.
Conclusion: the goal is the implementation of a different surgical and adjuvant
personalized therapy tailored to each individual patient.
Keywords: Breast Cancer; Genes; Personalized Therapy; Surgery; Oncology
Abbreviations: ER: Estrogen Receptor; PR: Progesterone Receptor; FISH: Fluorescence in Situ Hybridization; HER2: Human Epidermal Growth Factor Receptor 2; IHC: Immunohistochemistry; PBI: Partial breast irradiation
Introduction
Today, breast cancer is distinguished into at least three different
subtypes based on clinical and molecular parameters: luminal,
erbB2, and basal type, which exhibit different biological behaviors
and prognoses. Correctly identifying the molecular subtype of
the tumor opens the door to new, increasingly adequate and
targeted therapeutic possibilities for the treatment of the specific
molecular subtype [1]. In this scenario, luminal carcinomas (which
represent approximately 65% of the total) are distinguished by a particular heterogeneity of biological behavior, with recovery
of disease in approximately 40-50% and death in approximately
two-thirds of these patients 5 years after diagnosis, despite initial
anatomopathological pictures of apparent low aggressiveness [2-
9]. Precisely for this diagnostic category, biomolecular parameters
derived from the genome/transcriptome that are capable of
orienting the therapeutic choices in a more precise and personalized
way on the patient’s actual therapeutic needs are desirable
[10,11]. Currently, the clinical (T, nodal status) and biopathological
(hormonal status) parameters obtained from membrane receptor
expression provide prognostic information and an indication of any
adjuvant systemic chemoradiotherapy treatment [12-14]. However,
adjuvant therapy reduces the risk of recurrence by only 25-30%
[13]. These data are probably due to:
1. Clinicopathological parameters of stratification of the risk of
disease recovery are not sufficiently adequate for the patient’s
prognostic framework.
2. Adjuvant therapies are not specific enough toward the cells
responsible for disease recovery [15,16].
Knowing the details of the mutations of every single tumor
allows us to predict the biological behavior of that neoplasm and
to adequately stratify the risk. Genetic tests, such as Mammaprint
and Oncotype DX, EndoPredict [17,18], assist clinicians in
choosing the most suitable adjuvant treatment by analyzing the
expression profile of genes involved in the metastasis process (St
Gallen 2017). These tests are very expensive and often have to be
sent to foreign laboratories. The genetic profile is of the utmost
importance in the evaluation of parameters already known as the
expression of hormonal receptors and HER2; these are currently
determined with immunohistochemistry (IHC) or FISH methods,
which provide information about the morphological expression
of the receptor but not about their functional state. In any case,
knowing that the receptor is expressed at the membrane level is
not sufficient information to guarantee the effectiveness of the drug
addressed to it because that protein may not be functionally active.
Therefore, it is necessary to ascertain the functional activation of
the gene responsible for the synthesis of the protein to guarantee
its functionality, more than its presence. Moreover, from the
gene expression profile, additional information that specifically
correlates the expression of some genes to the response to
individual therapies can be obtained; for example, in HER2-positive
patients, the high expression of IGF1R correlates with resistance to
Herceptin, as well as the hyperexpression of CCNE1; instead, ER+
patients with high PDGFRA expression are resistant to tamoxifen
treatment [14,19,20]. The use of gene profiling tests offers the
opportunity for a more adequate risk stratification, an improvement
of the therapeutic planning and the clinical outcome, avoiding what
happens today, with the standard clinical-pathological criteria,
namely, the undertreatment of approximately 20% of women with
grade 1 breast cancer and overtreatment of approximately 15% of
women with grade 3 breast cancer
Methods
This retrospective study aimed to observe the difference in the receptor state obtained from the genome compared to that obtained with traditional immunohistochemistry. Analyzing biological material from formalin samples from 1994/1995 from two groups of selected patients with breast cancer, one consisting of patients still alive and the other of patients with a survival of less than 4 years. Survival data of a group of patients with a long follow-up period (RTUP) suffering from breast cancer diagnosed in the twoyear period 1994/1995 were obtained with the authorization of the Ethics Committee of Umbria (CEAS). The group, consisting of 55 patients, was then divided into 2 numerically balanced subgroups: the first of 30 patients staged alive on 31/12/2013 with survival of >20 years and the second of 25 patients staged in death with lower survival 4 years after diagnosis (Table 1). The relative tissue samples of the respective patients fixed in formalin and included in paraffin (FFPE) were subsequently collected and stored in the histoteca of the SC of Anatomy and Pathological Histology of the Hospital of Perugia. Twenty-two patients who had the respective tissue sample unsuitable for lack of biological material in the original inclusion block were excluded. The remaining samples, for a total of 33 cases, were all characterized again from a biopathological point of view through a dedicated immunohistochemical panel: ER, PgR, Ki67, HER2 and subjected to microscopic evaluation by a pathologist who was thus able to identify two groups of carcinomas: luminal and nonlluminal (according to the San Gallen criteria 2011 and following).
The recharacterization provided for the preparation of 1 slide
for hematoxylin and 4 slides with blank sections necessary for the
immunohistochemical panel for each sample. The pathologist made
the diagnosis based on current guidelines and verified that in each
section stained with the hematoxylin of each patient there were at
least 30% of the neoplastic cells out of the total. This percentage
figure is necessary to maximize the extractive yield of total RNA.
In practice, for samples with a number of neoplastic cells ≥ 30%
of the total, it was not necessary to proceed with macrodissection.
In the first phase of the experimentation, so with the 33 starting samples, the enrichment of the sample was not necessary, but 2/4
sections of the FFPE fabric of 10 μm thickness were cut in sequence
on microtome and placed in 1.5 ml safe look tubes ready to be
extracted. Total RNA, after appropriate quantitative and qualitative
evaluation, was used for the preparation of cDNA for sequencing
purposes. The library used is Illumina’s TruSeq Total RNA. In the
second phase of the experimentation, 2 pilot samples were used to
validate the panel of genes obtained with the first phase; they were
homogeneous both as biopathological characteristics of the lesion
(early breast cancer; tumor diameter; lymph node status; state of
HER2 and so on) and as the age of the patients. In the end, for each
patient, we obtained 2 tubes of tumor tissue and 2 tubes of healthy
tissue ready to be extracted. The extraction kit used to obtain total
RNA was the same as that used in phase I of the trial. Total RNA,
after appropriate quantitative and qualitative evaluation, was used
for the preparation of cDNA for sequencing purposes. The library
used was Illumina’s TruSeq RNA Access, suitable for processing
RNA extracted from paraffin samples. Total RNA was extracted
from sections of paraffin tissue using the “Tissue Preparation
Reagents” kit - Sividon Diagnostic, from the Pathological Anatomy
Section of the Santa Maria Della Misericordia Hospital. The chosen
extraction method was previously “validated” on 2 samples with
characteristics of “age of the sample”, “type of tissue” and “origin”
identical to the samples to be used for this work. Validation test of
the extraction method suitable for the preparation of the “TruSeq
RNA Sample Prep” library from RNA for sequencing.
For the deconvolution of the data from the HiSeq sequencer, to
process them, the following steps were carried out:
a. Demultiplexing: phase necessary to attribute to every single
sample its respective data (1 sample = 1 fastq file).
b. Fastqc: phase in which the quality control of the sequencing is
carried out through the use of the “fastq file for reads quality”
tool.
c. Trimming: delicate but essential phase of the deconvolution
process because it eliminates the reads and low-quality
fragments from the Fast file.
d. Mapping: important phase of the process because for each
sample a same file (also called bam) is built which specifies
how the reads align on the reference genome.
e. Count table assembly: a final phase that includes all the
information from the same file of the analyzed samples and
related “count reads” in a single table.
Based on the reference human genome, the following are the
data of the number of reads that align: Human Genome Assembly
(GRCh37/hg19). These data are used as a criterion for deciding
which samples will be analysed.
Samples with a low quantity of exons were excluded: a low
number of reads mapped to exonic regions <12%.
Two R/Bioconductor 3.2.2 packages were used for statistical
analysis:
1. DESEq2 (Love MI, Huber W, Anders S. Moderated estimation
of fold change and dispersion for RNA-seq data with DESeq2.
Genome Biol. 2014; 15 (12): 550.)
2. edgeR (Robinson MD, McCarthy DJ, and Smyth GK (2010).
“edgeR: a Bioconductor package for differential expression
analysis of digital gene expression data.” Bioinformatics, 26,
pp. -1)
The edgeR package was used on the data analyzed with DESeq2
to confirm the results obtained with this calculation algorithm. Both
are based on the negative binomial distribution but use different
correction tests. The use of the two packages allows obtaining a
list of genes that for both calculation algorithms are differentially
expressed between the samples.
The statistical significance that will be used in the two packages
is shown below:
1. DESeq2: P-value <0.05; P adjusted value <0.1
2. edgeR: P-value <0.05; False Discovery Rate <0.1.
An analysis of the network of genes significantly differentially
expressed in the two groups was also conducted using GeneMania
(http://www.genemania.org). This network was built based on the
interactions between the genes, as reported by results published
in the literature. Features such as gene coexpression, proteinprotein
interactions, physical interactions between genes and
other functions as described in the studies indicated in the table
compared to the network figure are evaluated. Although paraffin
introduced non negligible “background noise” in the analysis, the
number of reads obtained was satisfactory to conduct a differential
expression analysis using a “Read Counter” approach. It is possible
to evaluate the formation of 2 small subclusters, and the analysis of
differential expression with this dataset allows us to differentiate
25 genes differentially expressed between the 2 groups. The
hierarchical cluster of significant genes shows a homogeneous
trend among the 2 groups except for 2 luminal samples, which,
although clustered with nonluminal samples, are very close to
samples of the same stage (sample 7506 luminal staged due to
death is very close to sample 2626 nonluminal staged due to
death; sample 10429 luminal staged in life is close to sample 262
nonluminal staged in life). The IFIT3 and MX1 genes, among the
25 differentially expressed genes, have a strong interactome with
other genes not present in the list but strongly correlated with
each other in biological processes.From a careful observation of the PCA luminal vs nonluminal of the samples selected in Figure 1, it is
possible to observe the formation of 2 groups of samples that do
not comply with the luminal/nonluminal classification provided,
as shown by the PCA. Although paraffin introduced nonnegligible
“background noise” in the analysis, the number of reads obtained
was satisfactory to conduct a differential expression analysis
using a “Read Counter” approach. It is possible to evaluate the
formation of 2 small subclusters, and the analysis of differential
expression with this dataset allows us to differentiate 25 genes
differentially expressed between the 2 groups. The hierarchical
cluster of significant genes showed a homogeneous trend among
the 2 groups. Among the genes, the FGFR3 gene is highlighted: a
2012 paper may be useful in which the following is stated: “FGFR3
activation in MCF7 cells stimulated activation of the mitogenactivated
protein kinase (MAPK) and phosphoinositide 3-kinase
(PI3K) signaling pathways, both of which have been implicated in
tamoxifen resistance in breast cancer”. It should be considered that
it is 4 times less expressed in “luminal” cases than in “nonluminal”
cases.
Results
From the comparison of the results obtained with the two
methods of statistical analysis, 62 significant genes emerged
between living and deceased patients (Figure 2). In particular, 12
overexpressed genes emerged in live patients, and 50 overexpressed
genes emerged in deceased patients. The 8 intersecting genes of the
diagram are IFITM10, CEACAM7, KCNE4, OGDHL, NPY1R, SNCG,
ARHGAP23, and PIEZO2. The unique DESeq 2 gene is RBFOX1.
In this study, the data of the 26 samples (previous analysis)
were analyzed again to highlight the genes that are differentially
expressed in the luminal (life/death) and in the nonlluminal (life/
death) samples using the bioinformatic tools edgerR and DESeq2
in the R/Bioconductor environment using the following settings:
DESeq2 pValue <0.05, P adjusted value 0.1; edgerR P-value 0.05,
FDR 0.1. The parameters are those used in point 4.2 so that the
results can be compared.
The final result is divided into some innovative cornerstones:
from the 28 overexpressed genes (23 in the luminal and 5 in the
nonluminal), we found that
1. DSCAM-AS1 is a specific gene of the luminal A subtype that is
not present in healthy tissue or preneoplastic lesions. If this
gene is present, the patient could avoid standard PBI and
adjuvant therapy.
2. ER+ patients (good prognosis) died because of a high risk for
the overexpression of CCND1, INST4, and GAB2.
3. ER+ patients (good prognosis) died because the overexpression
of FOXJ3, SOX2, and FGFR3 (4 times less expressed in the
luminal region) correlated with the failure of endocrine
therapy.
4. in HERB2+ patients, we found other genes, among which
PAX8-AS1 was responsible for stem cell proliferation.
5. The fusion of IFITM10 and CTSD promotes cell proliferation of
the tumor and is a tumor marker.
The study highlighted a code of 33 genes that characterize
breast cancer. Forty-three percent of the isolated genes were
common between the 1st and 2nd sequences. Seven of our genes
are found in the commercial genetic tests PAM50, Oncotype and
Endopredict: four are present in the Panel Cancer targets 50 genes
of the Ion AmpliSeq. The verification of the validation of the 28 genes
selected from phase I in a more recent sample of women (2005) and
characterized by the molecular split point (Endopredict®) allowed
us to provide the study with greater consistency (Table 2).
Conclusion
A very positive result was the ability to extract suitable RNA
from 1994 samples in paraffin in good quantities and qualities to
make the study possible. In this regard, experiments have been
conducted to define the best protocol between RNA extraction
and library preparation. The calculation of the RNA (RNA integrity
number) and the concentration at Qubit allowed us to always
“select” valid samples. The basic hypothesis of the study has been
confirmed: the characterization of the luminal and nonluminal
tumors is not real through IHC (surrogate St. Gallen), which is
routinely used but must be read on a molecular basis. However,
the most relevant data are represented by the 33 overexpressed
genes: 28 genes (23 in the luminal and 5 in the nonluminal) and 5
genes (which confirm the premises of the study in wanting to find molecular markers capable of “personalizing” the therapy). Two of
the 28 genes were always present in both groups: a CXCL13 life gene
and an IFITM10 death gene. Moreover, IFITM10 and CTSD fusion
promotes cell proliferation of the tumor and is a tumor marker. The
DSCAM-AS1 gene is specific to the luminal A subtype. If this gene is
present, the patient could avoid standard PBI and adjuvant therapy.
The result obtained, which can be assumed to be transformed
into a genetic panel, following validation on more recent samples
(2005) and studied with Endopredict®, will help us to implement
a personalization of the therapy: surgical and adjuvant. In the
preoperative phase, with the core biopsy of the neoplasm, the
histological diagnosis is obtained, and then the biopathological
characterization and the presence of the genes described above are
verified to evaluate the risk of local and/or systemic recurrence,
which varies according to the molecular subtype.
The hyperexpression of CCND1, INST4 and GAB2 changes the
prognosis of ER+ patients from favorable to inaustic. Our work also
revealed that ER+ patients died (in which we would have expected a
good prognosis) because they had overexpressed FOXJ3, SOX2, and
FGFR3 genes that correlate with the failure of endocrine therapy.
In the postoperative phase, targeted therapy allows a better
stratification of adjuvant therapies based on the amplification of
the genes that regulate, for example, resistance to tamoxifen and/
or to trastuzumab.
Declaration Section
We give our consent for the publication of identifiable details, which can include case history and/or details within the text (“Material”) to be published in the above journal and article. The datasets generated and/or analyzed during the current study are not publicly available due to privacy reasons but are available from the corresponding author on reasonable request.
Ethics Approval and Consent to Participate
The study protocol was approved by official authorization of the Ethics Committee of the Health Authorities of Umbria (CEAS) n. 2682/15.
Patent Title
“Method for carrying out breast cancer prognosis, Kit and use of these” (original: “Metodo per effettuare prognosi del cancro della mammella, Kit ed uso di questi”) Ministry of Economic Development, General Directorate for the protection of industrial property, Italian Patent Office, n° 102017000109459, 18.02.2020.
Competing Interests
All authors declare that they have no competing interests.
Funding
This study was supported by the Italian League for the fight against cancer - LILT (Ministero della Salute, 5 x l000 year 2014).
Authors’ Contributions
AR designed the study, reviewed and analyzed the clinical data and did the critical revision of the manuscript; LF reviewed and analyzed the clinical data and did the critical revision of the manuscript; SZ data manager; PC reviewed and analyzed the clinical data and did the critical revision of the manuscript; FS enrolled the patients; CP performed the NGS, analyzed the data and drafted the manuscript; AS performed the NGS, analyzed the data and drafted the manuscript; FB reviewed the manuscript. All authors read and approved the manuscript’s final version.
Acknowledgment
Innovation Centre of Genomics, Genetics and Biology SCaRL Perugia.
References
- Swaby RF, Cristofanilli M (2011) Circulating tumor cells in breast cancer: a tool whose time has come of age. BMC Med 21: 9:43
- Associazione Italiana Registro Tumori AIRTUM 2016
- Sant M, Chirlaque Lopez MD, Agresti R, Sánchez Pérez MJ, Holleczek B, et al. (2015) EUROCARE-5 Working Group:. Survival of women with cancers of breast and genital organs in Europe 1999-2007: Results of the EUROCARE-5 study. Eur J Cancer 51(15): 2191-2205.
- Martin M, Brase JC, Calvo L, Krappmann K, Ruiz-Borrego M, et al. (2014) Clinical validation of the EndoPredict test in node-positive, chemotherapy-treated ER+/HER2- breast cancer patients: results from the GEICAM 9906 trial. Breast Cancer Res 16(2): R38.
- Crabb SJ, Cheang MC, Leung S, Immonen T, Nielsen TO, et al. (2008) Basal breast cancer molecular subtype predicts for lower incidence of axillary lymph node metastases in primary breast cancer. Clin Breast Cancer 8(3): 249-256.
- Dengel LT, Van Zee KJ, King TA, Stempel M, Cody HS, et al. (2014) Axillary dissection can be avoided in the majority of clinically node-negative patients undergoing breast-conserving therapy. Ann Surg Oncol 21(1): 22-27.
- Morrow M (2013) Personalizing extent of breast cancer surgery according to molecular subtypes. Breast 22 (Suppl 2): S106-109.
- Meyers MO, Klauber-Demore N, Ollila DW, Amos KD, Moore DT, et al. (2011) Impact of breast cancer molecular subtypes on locoregional recurrence in patients treated with neoadjuvant chemotherapy for locally advanced breast cancer. Ann Surg Oncol 18(10): 2851285-7.
- Filipits M, Rudas M, Jakesz R, Dubsky P, Fitzal F, et al. (2011) A new molecular predictor of distant recurrence in ER-positive, HER2-negative breast cancer adds independent information to conventional clinical risk factors. Clin Cancer Res 17(18): 6012-6020.
- Dubsky P, Filipits M, Jakesz R, Rudas M, Singer CF, Austrian Breast and Colorectal Cancer Study Group (ABCSG) et al. (2013) EndoPredict improves the prognostic classification derived from common clinical guidelines in ER-positive, HER2-negative early breast cancer. Ann Oncol 24(3): 640-647.
- Slodkowska EA, Ross JS (2009) MammaPrint 70-gene signature: another milestone in personalized medical care for breast cancer patients. Expert Rev Mol Diagn 9(5): 417-422.
- Early Breast Cancer Trialists' Collaborative Group (EBCTCG) (2005) Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomized trials. Lancet. 14-20;365(9472): 1687-1717.
- Ludovini V, Antognelli C, Rulli A, Foglietta J, Pistola L, et al. (2017) Influence of chemotherapeutic drug-related gene polymorphisms on toxicity and survival of early breast cancer patients receiving adjuvant chemotherapy. BMC Cancer 17(1): 502.
- Van Agthoven T, Sieuwerts AM, Meijer D, Meijer-van Gelder ME, Van Agthoven TL, et al. (2010) Selective recruitment of breast cancer anti-estrogen resistance genes and relevance for breast cancer progression and tamoxifen therapy response. Endocr Relat Cancer 17(1): 215-230.
- Rulli A, Listorti C, Foglietta J, Burattini M, Caracappa D, et al. (2015) Impact of genetic signature on breast cancer therapy: preliminary experience. Minerva Med 106(5): 309-313.
- Rulli A, Antognelli C, Siggillino A, Talesa V, Zayik S, et al. (2020) Liquid Biopsy in early Breast Cancer: a preliminary report. Ann Oncol 3(1): 8-8
- Rulli A (2017) “Experience with Endopredict in breast unit”, AIS Firenze.
- Dubsky P, Brase JC, Jakesz R, Rudas M, Singer CF, et al. (2013) The EndoPredict score provides prognostic information on late distant metastases in ER+/HER2- breast cancer patients. Br J Cancer 109(12): 2959-2964.
- Antognelli C, Del Buono C, Ludovini V, Gori S, Talesa VN, et al. (2013) CYP17, GSTP1, PON1 and GLO1 gene polymorphisms as risk factors for breast cancer: an Italian case-control study. BMC Cancer 20 (9): 115.
- Rulli A, Antognelli C, Prezzi E, Baldracchini F, Piva F, et al. (2006) A possible regulatory role of 17beta-estradiol and tamoxifen on glyoxalase I and glyoxalase II genes expression in MCF7 and BT20 human breast cancer cells. Breast Cancer Res Treat 96(2): 187-196.