Computational Prediction of Protein-Protein Interactions in Protein Disulphide Isomerase (PDI) Protein Family in the Red Cell Membrane of PV Patient Polycythaemia Vera - Computational Proteomics Analysis

in in complexes Berggård, et al. Protein-Protein Interactions are involved in almost all cellular processes: regulation of cell cycle, gene expression, cell-to-ARTICLE ABSTRACT Polycythaemia Vera (PV) is considered as a serious Myelo Proliferative Neoplasm (MPN). The most characteristic feature of this disease is an abnormally high red blood cell mass and/or haematocrit because of uncontrolled erythrocyte production independent of the normal regulatory processes of erythropoiesis. Recent findings has identified that 4 proteins: PDIA6, TXND5, ERP44 and PDIA1, which belong to the Protein Disulphide Isomerase (PDI) protein family demonstrated a significant increase in abundance in the red cell membrane of PV patients when compared to the healthy controls. The root cause for the PV is identified as the mutation V617F in JAK 2. In this study we have attempted to model the protein-protein interaction of the JAK2 with above identified PDI proteins using the PRISM (Protein Interactions by Structural Matching) algorithm which bases, its predictions on structural properties of any given pair of proteins and their evolutionary conserved relationships. The analysis of four PDI proteins with respect to JAK2 resulted in one meaningful interaction in which is JAK2 and TXND5 interact via CDK5, a well-known cell cycle associated protein. This finding will shed new light on the understanding of the molecular basis of PV and lead to discovery of new drugs for this disease. Isomerase Reticulum Interface

cell interactions and metabolic and developmental control Braun et al. [2]. The concept of ''protein interaction'' is generally used to describe the physical contact between proteins in which they interact via their interfaces Tuncbag, et al. [3]. Therefore, studying the interface properties of proteins will help explain their role in protein-protein interactions. The biological properties of a protein molecule depend on its physical interaction with other molecules.
Therefore, studies on identifying interaction sites of proteins and knowing which proteins interact with which protein and other molecules are essential to understand better their role within the cell and the basis of many cellular processes Rao, et al. [4]. Proteins that can interact with multiple partners play central role and act as hub proteins in the network of protein-protein interactions Higurashi, et al. [5]. Due to the pivotal role protein interactions play in cellular processes, they are central in controlling mechanisms leading to healthy and diseased states in organisms Kar, et al. [6,7].
Mutations in proteins can occur in active sites, allosteric binding sites and DNA binding sites. These mutations can cause changes in proteins and affect their interactions which may lead to dysfunction of some interactions and cause diseases such as cancers Kar, et al. [6,7]. Therefore elucidation of protein interaction networks could reveal the molecular basis of diseases which in turn could provide insights into developing methods of prevention, diagnosis Kann [7,8] and also lead to development of treatment methods for diseases by identification of possible drug targets Pedamallu, et al. [9].
Considering the importance of studying protein-protein interaction networks, there has been development of several approaches to detect these interactions. Approaches based on genome-wide experimental methods such as the Yeast two hybrid test, protein chips and mass spectrometric analysis has led to the detection of numerous interactions. Recently a number of computational approaches have been developed for the prediction of protein-protein interactions. Computational methods for the prediction of PPIs provide a fast and inexpensive alternative to complement experimental efforts. Computational interaction studies can be used to validate experimental data and to help select potential targets for further experimental screening Shoemaker, et al. [10]. PRISM (Protein Interactions by Structural Matching) is a web server that can be used to explore protein interfaces and predict protein-protein interactions Ogmen, et al. [11]. The algorithm in PRISM principally seeks pairs of proteins that may interact in a dataset of protein structures (target dataset) by comparing them with a dataset of interfaces (template dataset) which is a structurally and evolutionarily representative subset of biological and crystal interactions present in the Protein Data Bank (PDB) Berman, et al. [12]. PRISM  is reported to be in the range of 6 to 18 months while it is more than 10 years for treated patients who highlight the seriousness of PV Stuart, et al. [13]. PV occurs with a slight predominance in men and the incidence of PV is 2-3 per 100,000 persons per year. At cellular level the cause of polycythemia is reported to be due to increased proliferation or decreased apoptosis of erythroid progenitors, or due to delayed erythroid differentiation with an increased number  [16]. With this background, in this paper we attempt to determine interactions between JAK2 protein PDI proteins in erythrocyte membranes in PV patients employing the protein interaction prediction tool, PRISM.

Methodology
In this study, we used high performance prediction PRISM algorithm to analyse the possible interactions between JAK2 and PDI proteins. For clarity purposes the methodology used in this study is shown in the work flow plan. The PRISM algorithm uses the rationale that if particular surface regions of any two proteins are spatially similar to the complementary partners of a known interface, in principle these two proteins can interact with each other via these regions. The prediction algorithm uses a template interface dataset and a target single protein structure dataset to predict such potential interactions between target proteins. The description of these two sets and the details of the PRISM algorithm are clarified in the following sections.

Protein Interactions by Structural Matching (PRISM)
In particular proteins their interfaces are evolutionary more conserved than other surface regions of the proteins. In this case The PRISM algorithm Tuncbag, et al. [3] uses two data sets to model interactions, the template data set and the target data set Baspinar, et al. [17]. The target data set contains the surface regions of the two proteins that we want to model the interaction between, i.e. JAK 2 and the PDI protein s. For this interaction model we considered only binary interactions between PDI proteins and JAK2. In other words we only considered only one PDI protein with respect to the JAK2. Therefore at a given time target dataset would be surface regions of the JAK2 and the respective. PDI protein. The analyses were made using the default settings provided with the template data set from the PRISM web server. The template data set Cukuroglu, et al. [18] consists of known, available structures from the PDB and their interfaces. The generation of the template set can be made using various physical parameters, such as atomic distances and solvent accessibilities, and modelled using Voronoi diagrams. This generated template dataset is used for structural matching with the targets provided. In this structural matching a number of parameters have been considered and a unique scoring system is used to select the optimum structural matching between the given target protein surfaces and the templates.

Preparation of the Template Dataset and Hotspot
Generation: The method of atomic distances is used to generate the template data set. To define the interfaces of the template structures, two types of residues are defined in each chain: "interacting residues" and "nearby residues". Two particular residues will be considered interacting residues if any two atoms from them are closer than the sum of their van der Walls radius plus 0.5 Å. A specific residue will be a nearby residue if it is a noninteracting residue and it has a C alpha atom which is positioned less than distance of 0.5Å from an interacting residue in the same If the target is a multimeric protein it is resolved into its respective domain chains. If the target chain has DNA or RNA structures they are not considered in this algorithm. The homologies of the target chains are initially accounted. The algorithm uses the remote access to the Naccess service for the surface area calculation. The Naccess algorithm rolls a virtual solvent probe across the target surfaces, with the radius of a solvent particle taken as 1.4 Å. Therefore the depth or height of the rolled solvent probe will be the same. The path gained by the centre of the probe gives the accessible surface area. The protein surface is the shell around the entire monomer surface. The surface regions in the target data set are extracted based on the relative accessible surface area of the residues. If the relative accessibility (i.e. the accessible surface area relative to that of the residue in an extended conformation) of a residue is more than 15%, it is labelled as a surface residue. Thereafter, the nearby residues among surface residues are extracted. They are used to provide the structural scaffolds of the protein surfaces. value is set to two Å and the minimal match size is set to five residues. Multiport performs structural alignment in two steps, known as local fragment alignment and global alignment. As we have only considered pairwise alignment in this case, Multiprot first selects one molecule as stable and the other one as pivoted.

Structural Alignment and Transformation
This is then processed for all the possible combinations and those that are below the predefined r.m.s.d values of their 3D Euclidean transformations are considered structurally similar. The core alignment task was done using C alpha atoms. Finally, a global alignment step was performed to search for the largest structural cores between the aligned molecules. A combination of fragments is selected heuristically because finding the optimal combination is an NP-hard problem. When a unique combination is obtained, the similarity in that combination is calculated by means of (RMSD) values.
The transformation and the filtering are made by the checking of structural matching thresholds. The threshold is set to 50% in this analysis. A total of 50% of the residues in the template domain should match with the provided surfaces for the algorithm to proceed. The numbers of matching residues should also be at least 15. Transform target proteins on their similar template interfaces to form the complex structure. The transformation matrix for each matching is generated and each target protein is transformed onto the corresponding template interface partner. In this case entire target structure is considered instead of the surface region. If two partners have more than five spatially colliding residues, the match is eliminated. Side chain clashes are also neglected in the analysis.
To guarantee the correct matching, the results are checked to establish whether there are at least five contacts between matching residues of the complementary partners of the template interface.

Results
Initially, in this study all four proteins (PDIA6, TXND5, ERP44 and PDIA1) proteins were tested as potential partners of JAK2 deploying PRISM. However, PDIA1 and PDIA6 when analysed yielded a very clear unambiguous response indicating there is no interaction between PDIA1 and PDIA6 with JAK2. ERP44 protein when tested revealed three potential interactions with JAK2 through three protein domains which had the following PDB Ids: 2j23, 10BB, and 3ZRJ. These three interacting proteins when, searched using Uniprot, PDB, and SWISS prot databases, it revealed that, none of them were reported to be found in Homo sapiens.
Therefore proteins that gave these interactions were not considered However analysis of sequence between these proteins revealed that there is low sequence similarity. Low sequence similarity between JAK 2 and TXND 5 is indicative that they are highly unlikely to interact directly Pearson [22]. In order to discover any possible interactions between JAK 2 and TXND5 analysis of structural alignment was carried out employing multiprot algorithm which bases the analysis on 3D structural properties of JAK 2 and TXND5 proteins. The output of the structural alignments of two proteins is based on the RMSD (root mean squared values) which is a measure of their divergence from one another. The RSMD value was 1.98 in this alignment. The structural alignment gives superimposition of the atomic coordinates. The minimum information generated by each respective structural alignment is a set of superimposed coordinates for each input structure. The Table 1 shows the structurally aligned residues of JAK2 with respect to its matching residues in TXND5.When aligning two structures the side chain's atoms were not taken into account. Analysis of results of multiprot algorithm as shown in Table 2 revealed a high structural alignment between sequentially unrelated (JAK2 and TXND5 proteins. This revealed these two proteins have a possibility of interacting with each other via a complex irrespective of their low sequence similarity. In this protein complex the target protein (TXND5) is predicted to act as a putative binding ligand to the protein complex.

Hot Spot Generation and Interface Defining of JAK2 (PDB ID: 2B7A)
The hotspots were generated because of the interactions among two interfaces occur through them. The input for the HOTPOINT web server was the pdb format file of JAK2 which plays the key role in patho physiology of PV. Along with the pdb formatted file, the chain identifiers of the domains of JAK2were also employed as an input parameter for the analysis in order to define the interfaces.
The output of the HOTPOINT server is given as a table (Table 1) which consists of the interface residues and their features. The interface residues were tabulated with chain names, one-letter residue names, and residue numbers, their relative ASA (Accessible Solvent Area), relative ASA in monomer and total pair potentials. In the last column of the table (Table 1), the prediction is presented as H (hot spot) or NH (non-hotspot). The identified hot spots are highlighted (Table 1).

Construction of the Interaction Complex by Searching Through Template Data Set
The 1UNH). CDK5 a well-known cell cycle associated protein that has protein cyclin kinase activity. The output from the hotspot analysis of CDK5 is illustrated in Table 3. Here the interface residues were tabulated with chain names, one-letter residue names, and residue numbers, their relative ASA in complex, relative ASA in monomer and total pair potentials. In the last column of the

Discussion
PPI occur as a result of physical contact between proteins and their interacting partners. These interactions occur via their interfaces. Therefore, studying the interface properties of interacting protein molecules provide useful insights to the molecular mechanisms involved in cellular functions and biological processes.

Furthermore, interactions between proteins carrying mutations
that result in dysfunctional protein interactions is considered to be the common mechanism that lead to diseases, including cancer Kar, et al. [6,7]. Hence studying the underlying protein interactions is likely to enhance our understanding of the molecular mechanism that bring about diseases. In the present study, the focus was to predict protein-protein interactions, centred on JAK2 protein with V617F mutation. This mutation is known to result in the production of a constitutively activated JAK2 protein, which is considered to play an important role in the development of Polycythaemia Vera there is no evidence of the presence of these three interacting proteins in Homo sapiens; thus they were not considered for further analysis. One interaction was predicted for TXND5) with JAK2 and was subjected to further analysis. Structural alignment of two target proteins JAK2 and TXND5revealed that there is considerable structural similarity of 62% (RCSB PDB) between the target proteins despite their low sequence similarity of 11.9%. Therefore it can be deduced by structural alignment of two target proteins JAK2 and TXND5 that they are having a close evolutionary relationship and hence they are more likely to interact with each other through a complex.JAK2, which is a multimeric protein, was subjected to analysis initially in our study in order to discover all the possible matches from the template data set as there is a high probability to find a matching partner from the template data set for a multimer rather than a monomer which has lesser number of potential matches. Using Hot point web server the hot spots of JAK2 were generated based on few simple rules such as, if a particular interface residue results in a binding energy more than 2.0kcal/mol it was categorized as a hot spot and if the binding energy of a particular surface residue was less than 2.0kcal/mol it was categorized as a non-hotspot by default Lucet, et al. [25]. At the same time the rules that define surface residues also applies when generating hotspots. In addition, interface residues were identified from the given structure using a threshold level of relative accessible solvent area (RASA) for a given residue (Baspinar et al., 2014). If the RASA for a given residue is more than or equals to 20% it was categorized as an interface residue. This analysis led to the identification of six distinct hotspot residues in JAK2 protein that can be considered to have the highest probability of interacting with another protein.
The computational approach which resulted in the generation and determination of accessibility of the identified hotspots for possible interaction in extracted surfaces of the two target proteins JAK2 and TXND5 revealed that they could not interact directly.  [27,28]. The JAK2/STAT signalling pathway affects cellular activities, such as cell proliferation, migration, growth, differentiation and cell death Duan, et al. [29].
Analysis of functional role of JAK2 protein with V617F mutation is TXND5 is the third protein of the three member protein complex which has been the focus of the present study [33][34][35]. TXND5 is a member of the PDI protein family in which members have been found to contain at least one thioredoxin (Trx) domain. Erythrocytes are constantly exposed to oxidative stress as they are rich in oxygen and have an extensive array of antioxidants to counter this level of stress 2012. In PV patients erythrocytes are exposed to very high levels of oxidative-stress conditions and increased abundance of thioredoxin domain containing PDI proteins such as TXND5 have been reported Kottahachchi, et al. [16]. It is suggested that PDI proteins including TXND5 could act as an inhibitory factor for oxidative stress and enhance the tolerance of erythrocytes to oxidative stress and delaying oxidative stress induced apoptosis Kottahachchi, et al. [16]. This process may provide a mechanism for prolonged life span of erythrocytes and could contribute towards in high erythrocyte mass found in PV patients [40][41][42][43].
The above evidence reveal that CDK 5 is major component in neuronal cell apoptotic process and lays a foundation to build up the implication that CDK5 structurally as well as functionally interacts with TXND5 which is also having the major function as negative regulation of the apoptotic process [44][45][46][47]. Furthermore It is believed to be that there is a very high potential of these JAK2 and CDK5 is involved with the negative regulation of the apoptosis in the red blood cells also [48][49][50]. Interactions between proteins play an essential role in the proper functioning of living cells and also PPIs mediate essentially all biological processes. We have elucidated a pathway that two target proteins JAK2 and TXND5 interact through CDK5. To form interactions between two proteins, they should share common structural similarities as well as common functional properties [51][52][53]. The three proteins have structural similarities, such as the presence of thioredoxin like domains (Trxs) and these Trxs act against oxidative stress induced apoptosis of the cells. Patients with PV carry the somatic JAK2V617F mutation in their hematopoietic cells [54][55][56][57]. This mutation results in constitutive activation of JAK2 tyrosine kinase which leads to an increase in the phosphorylation activity of JAK2, which in turn promotes the spontaneous cellular growth and induces uncontrolled erythrocytosis [58][59][60][61]. JAK2 and CDK5 being kinases that regulates the cell cycle and TXND5 carrying anti apoptotic function the JAK2-CDK5-TXND5 pathway could additionally supports this hypothesis that over production of red blood cells in PV patients and also their prolonged life span [62][63][64][65].
In this study PRISM, a template based, computational proteinprotein interaction prediction tool was successfully employed in which it predicted JAK2 protein interacting with TXND5 via CDK5.
PRISM for its predictions as stated earlier, uses interfaces of large number of non-redundant protein templates extracted from PDB.
The analysis deploying a template data set comprising of 22604 protein interfaces with known interactions for its predictions on protein -protein interactions, invariably enhances both the reliability and the accuracy of the results obtained pertaining to JAK2 protein interacting with TXND5 via CDK5 [66][67][68]. In an attempt to validate the overall accuracy of PRISM, it was compared with STRING database. The F measure calculated, demonstrated that predictions on PPI computationally generated relying on structural properties of proteinsby PRISM and PPI based on experimental data in STRING are comparable, and both methods have similar performance as tools in elucidating protein-protein interactions [69]. Furthermore, PRISM appears to be an appropriate computational tool which has been reported as a web server which enables efficient, protein -protein interactions with high accuracy However, prior to implement this, there should be an experimental validation of the results that achieved by computational methods.