Risk models and scores for metabolic syndrome: a systematic review

Background: Metabolic syndrome is linked with increased risk of cardiovascular disease, diabetes and all-cause mortality. Despite the high number of models and scores for assessing the risk of developing MetS, there is hardly any used in practical setting. Hence, we conducted a systematic review to determine the performance of risk models and scores for predicting metabolic syndrome. Methods: We systematically searched MEDLINE, CINAHL, PUBMED and Web of Science to identify studies that either derive or validate risk prediction models or scores for predicting the risk of metabolic syndrome. Data concerning models’ statistical properties as well as details of internal or external validations were extracted. Tables were used to compare various components of models and statistical properties. Finally, PROBAST was used to assess the methodological quality (risk of bias) of included studies. Results: A total of 15102 titles were scanned, 29 full papers were analysed in detail and 24 papers were included. The studies reported about the development, validation or both of 40 MetS risk models; out of these, 24 models were studied in details. There is signi�cant heterogeneity between studies in terms of geography/demographics, data type and methodological approach. Majority of the models or risk scores were developed or validated using data from cross-sectional studies, or routine data that were often assembled for other reasons. Various combinations of risk factors (predictors) were considered signi�cant in the respective �nal model. Similarly, different criteria were used in the diagnosis of MetS, but, NCEP criteria including its modi�ed versions were by far the most widely used (32.5%). There is generally poor reporting quality across the studies, especially concerning statistical data. Any form of internal validation is either not conducted, or not reported in nearly a �fth of the studies. Only two (2) risk models or scores were externally validated Conclusions: There is an abundance of MetS models in the literature. But, their usefulness is doubtful, due to limitations in methodology, poor reporting and lack of external validation and impact studies. Therefore, researchers in the future should focus more on externally validating/ applying such models in a different setting. Protocol


Background
The prevalence of Metabolic Syndrome (MetS) has increased signi cantly over the last three decades [1].In 2015, the International Diabetic Federation (IDF) suggested that approximately a quarter of the global adult population (>20 years) are having MetS [1] This gure is disturbing, especially because individuals with MetS have increased risk of Cardiovascular diseases (CVD) (2 to 3-fold) [2], and type 2 diabetes (T2DM) (up to 5-fold) [3].Additionally, the number of those with the syndrome may likely reach a frightening level in the nearest future given the current worldwide trends of rising prevalence of hypertension, obesity and diabetes [1,4].Therefore, it is necessary to adopt effective preventive strategies and work in a systematic way to reduce the rising burden of morbidity and mortality related to MetS.owever, studies of MetS are often challenging and problematic.First, there is lack of a universally acceptable de nition of MetS.Although the various proposed de nitions utilised same metabolic risk factors in de ning the cluster of risk factors, they vary regarding the cut-off point of the individual components, the weight assigned to certain components, or even the pathophysiological reasoning behind the clustering [5].This lack of de nition that is generally acceptable makes comparison between studies di cult.Second, the exact mechanism that brings about MetS remains unclear despite advances in pathophysiology and risk factor identi cation.Certainly, the widely observed difference in terms of susceptibility and age of onset is highly suggestive of a major interplay between genetic and environmental factors 6 .However, the main aim of developing the concept of MetS is not describing or identifying the biologic basis of the constellation, but rather, to identify individuals with increased risk of CVD and T2DM.Actually, the term 'syndrome' is used to demonstrate the fact that there are numerous possible pathophysiological mechanisms that can explain the clustering of risk factors.
Risk prediction models are of great signi cance in supporting decision making both in clinical and public health practice, and are increasingly being incorporated in guidelines [7,8].For instance, in cardiovascular disease (where the application of models is more advanced), several prediction models have been developed and are currently in use, e.g.QRISK (qrisk.org), the Framingham risk score (framinghamheartstudy.org),Assign score (assign-score.com)etc.Furthermore, prediction modelling is becoming more popular in chronic disease research due to the increase in availability of large datasets, advanced statistical methods and computational power [8,9].This may have a crucial role to play in informing how the rising burden of MetS on public health can be reduced.However, researchers that either develop or validate multivariable prediction models face several challenges.Indeed, regardless of disease area or discipline, these challenges include; poor reporting of prediction model studies, use of inappropriate statistical techniques, small sample, poor handling of missing values and absence of validation [10][11][12][13][14][15].These methodological de ciencies result in models that currently are not or should not be utilised.For that reason, it is not surprising that, relative to the large number of models published, only a few are extensively implemented or utilised in real life settings [16].
In recent years, there is proliferation of models and scores for assessing the risk of developing MetS, however, to the best of our knowledge, none is in routine use either in clinical or in public health setting, and there is no available systematic review in the academic literature.This we believe may present a confusing picture for both GPs, public health specialists and policymakers, who would be potentially faced with very complex literature, multiple different methodologies, and probably very few studies of use in real life.
We conducted a systematic review to determine the performance of risk models and scores for predicting metabolic syndrome.

Aim
This review aims to determine theperformance of risk models and scores for predicting metabolic syndrome.Objectives 1.To systematically review known metabolic syndrome risk models and scores.
2. To analyse the demography of the populations from which the models and scores were derived and/or validated.
3. To analyse the nal components of the models and scores and their contribution to overall risk.This review follows standard methodology for systematic reviews reported in a previous study as well as in the York Centre for Reviews and Dissemination guidelines [17,18].

Search strategy
Mixed search strategy involving both electronic and manual databases [19,20] was adopted in this study.The search strategy was designed with the help of IHR specialist librarian (DA), and relevant guidance was drawn from "Systematic Reviews: Centre for Reviews and Dissemination guidance for undertaking reviews in health care", and "Systematic Reviews to Support Evidence-Based Medicine" [17,18] to identify any relevant studies of metabolic syndrome risk models and scores.The nal search strategy was implemented by MI and was double-checked by DP, GR and YP.The nal search was conducted on 21 September 2018.
The literature was searched using keywords which includes: predict, screen, risk, score, metabolic syndrome, insulin resistance syndrome, model, regression, risk assessment, risk factor, calculator, analysis, sensitivity and speci city, ROC and odds ratio.Both MESH terms and text words were used.Articles were searched using titles and abstracts, the search was limited to studies conducted in English language, but no date restriction was applied.
The details of the search strategies used can be found in supplementary material 1.0.
The literature search was conducted in databases such as MEDLINE, CINAHL, Web of Science and PUBMED.

Eligibility criteria
We included peer-reviewed studies that either combine two or more known risk factors to derive a metabolic syndrome risk model or score, or validated a preexisting model on a different population or conducted both.Furthermore, the main outcome of this review is metabolic syndrome, and the secondary outcomes are any related predictive outcomes (discrimination and calibration inclusive).Finally, this review only includes studies published in English.
We excluded studies on screening and early detection, genetic mutation models, conducted on animals, investigating one or more single risk factors which are not connected to build a model or score, studies that applied other disease model or score to predict MetS.Also, studies whose main outcome is not metabolic syndrome, studies that did not report any related predictive outcomes (either discrimination or calibration).Finally, studies conducted in languages other than English.

Selection of studies
A total of 16821 titles were transferred into the electronic reference software Endnote version 8 (endnote.com),and duplicates were removed automatically, resulting in a total of 15222 titles.The duplicate titles that were not removed automatically by endnote were removed manually, resulting in 15102 titles.
The entire 15102 titles were scanned by MI, and if the title was suspected to represent a paper that met the inclusion/ exclusion criteria, the entire abstract was reviewed.
Title scanning and abstract review was completed in November 2018.A total of 66 titles were marked as potentially meeting the inclusion criteria.Out of these, ten (10) titles were double-checked by DP.
The full paper review was conducted by applying the inclusion/exclusion criteria to the retrieved articles.At this stage, studies were excluded because of the following: predicting genetics (1), investigating one or more single risk factors which are not connected to build a model or score (20), used unconventional predictors (alternative medicine) (1), applied other disease model or score to predict MetS (CVD, T2DM) (4)main outcome is not metabolic syndrome (8) did not report any related predictive outcomes (either discrimination or calibration) (7) conducted in languages other than English (2).This reduced the number of full papers to 23.

Full papers from other sources
In order to identify more relevant articles, a manual search of the reference lists of all the selected articles was conducted.Furthermore, relevant "grey literature" was searched for in the following: The Grey Literature Report (www.greylit.org/),,OpenGrey (www.opengrey.eu/)and OAISTER (www.oaister.org/).. From the above, three further papers were added from the initial scoping search, and three from the reference lists of the included papers.However, the search for the grey literature yielded no result.This makes the total number of articles selected for data extraction to 29.
The selection process is summarised using the PRISMA ow diagram (PRISMA 2009) (see gure 1.0).

Data extraction
Data extraction was conducted using a standard form adopted from a similar study [21], and saved in Microsoft Excel 2016.The extracted data were on those variables relevant to the review question and which satis ed the conditions for the narrative synthesis conducted.It is noteworthy that, some of the studies presented several models with each model composed of different risk factors.However, it is beyond the capacity of this researcher to study in details each of those models.Furthermore, the researchers themselves often conclude that one of their reported models is obviously better than the others in terms of performance.Therefore, where this is the case, data from the authors' preferred model(s) or (if no clear preference was stated in the article), the one judged to be more detailed or robust statistically was extracted.During the data extraction, a total of ve studies were excluded, leaving 24 articles.
The primary data extraction was conducted by MI and double-checked by DP, GR and YP and discrepancies were resolved by discussions.

Assessment of methodological quality
The PROBAST (Prediction study Risk of Bias Assessment Tool) [11], a tool for assessing the risk of bias and applicability of prognostic model studies, was used to assess the quality (risk of bias and applicability) of included studies.Brie y, the PROBAST is a tool recently developed to assess the quality of primary studies included in a systematic review.It evaluates both risk of bias and issues concerning applicability of studies that develop, validate or update a multivariable model (both diagnostic and prognostic).Furthermore, PROBAST comprises of 4 domains covering 20 signalling questions to enable risk of bias assessment and applicability.These domains are concerned about participants such as (the study design used, whether appropriate inclusion/ exclusion criteria were used), the predictors used, outcome, and how the analysis was conducted.
Aside from its speci c purpose of appraising studies in systematic reviews of prediction models, PROBAST can also be utilised in the general critical appraisal of primary prediction model studies.Noteworthy, PROBAST is not meant for generating summary "quality score" due to the documented drawbacks related to such scores [22].Therefore, the effect of problems observed within each domain should be discussed by users [21].In summary, the quality assessment revealed that in the entire included studies there is moderate-to-high risk of bias, primarily due to the use of inappropriate study design and absence of external validation.Further look at the studies, it was observed that majority of the models suffered a high risk of bias and signi cant methodological de ciencies arising from poor choice of model analyses, signi cantly underpowered analyses, dichotomisation of continuous variables, lack of adjustment for optimisation, poor handling of missing data and overall poor model presentation.
Prioritising/ ranking models or risk scores The number of papers and risk models or scores included in the nal sample of this review is relatively high.Therefore, for clarity, it was decided to highlight the risk models or scores with the most potential to be useful to end users, i.e. practitioners, policymakers or laypersons.Furthermore, for any prediction model or risk score to be considered useful, it should be accurate (statistically signi cant calibration, and discrimination above 0.70), generalisable (externally validated by a separate research team on a different population) and usable (has few components that are commonly used in practical setting) [23].However, MetS prediction discipline is arguably still in its early phase of development; therefore, it is di cult to identify any model or score that ful ls all of the above criteria.Hence, to prioritise risk models or scores in this study, we developed pragmatic criteria by modifying the criteria set by Altman et al. [23].A similar approach was used by Nobel et al. [20].
Studies were favoured if they used prospective/cohort data to develop their model, they reported discrimination above 0.70 and or calibration, and has few components that are commonly used in a practical setting.The three prioritised risk models or scores are summarised in an easily accessible table (see table 2.0 below).

Results
A total of 29 full papers were analysed in detail after which a nal sample of 24 papers was produced.Of these 24 papers, 22 report about the development of one or more risk model or score [24][25][26][27][28][29][30][31][32][33][34][35]37,38,[40][41][42][43][44][45][46][47], and 2 studies report about the development and external validation of one or more risk model or scores on an external population [36,39].Overall, the 24 studies reported 40 models, out of which 24 models were selected for full data extraction.The rest (16 models) were not selected, either because they were judged to be minimally different from the reported ones or they were not the preferred models by the authors or they were signi cantly de cient in details or statistical reporting.Furthermore, publication dates of the included studies ranged from 2008 to 2018 with the majority appearing within the past 3 to 4 years (see gure 2.0).
Table 3.0 provides the detail characteristics of the included studies.In summary, there is high heterogeneity in the studies.Studies were conducted in 16 countries across 6 continents (10 in Asia, 4 in North America, 4 in Europe, 4 in the Middle East, 1 in South America, 1 in Australia), but none from Africa.
Similarly, due to the heterogeneity of data, difference in methodological approach and presentations, it is challenging to make comparisons across studies.In terms of study design, majority (seventy per cent) of the models or risk scores were developed or validated using data from cross-sectional studies, or routine data that were often assembled for other reasons.Again, of the remaining thirty per cent that reported using cohort data [33-35, 41, 43, 47], none of Surprisingly, the observed incidence/prevalence of MetS at the end of the study is not reported in nearly a third (twenty-nine per cent) of the studies [24,28,31,37,41,42].For those that reported it, the prevalence/ incidence of MetS ranged from as little as one per cent to as high as thirty-seven per cent (median is per cent).As expected, lower prevalence/ incidence was observed in children studies, and the higher rates were observed from those studies with adult participants.
In all the 24 included risk scores, various combinations of risk factors (predictors) were considered signi cant in the respective nal model.Again, different weights were assigned to different components in the various models.The number of predictors utilised in a single risk score ranged from 2 to 11 (mean 5.6, SD 1.95).Similarly, different criteria were used in the diagnosis of MetS, including IDF, NCEP, and CmetS.But, NCEP including its modi ed versions were by far the most widely used criteria (32.5%).
There is generally poor reporting quality across the studies, especially concerning statistical data.For instance, calibration (of any statistic form) was reported by only three out of the twenty-four studies [35,36,47].Similarly, one-fth of studies did not report sensitivity/speci city; two-third did not report positive and negative predictive values.Similarly, Area Under Receiver Operating Curve (AUROC) ranged from 63.0 to 97.8.One study did not report any AUROC.
Regarding the validity of the risk models or scores, any form of internal validation is either not conducted, or not reported in nearly a fth of the studies.The commonest technique of internal validation used by those studies is ROC analyses (eighteen studies).Similarly, only two (2) risk models or scores were externally validated.More so, all the external validations were done by the same authors and, reported in the same paper (with corresponding model development).
Studies differ in terms of the biomarkers they used to capture certain assumption in MetS.For example, as a measure of body fat composition (obesity), 18 studies used waist circumference to denote obesity while 4 studies used BMI, and 2 studies did not use either (any).Similarly, fasting blood glucose (FBG) is the most popular marker of body glucose used.14 studies utilised FBG, 3 studies used HOMA-IR, one study used salivary glucose and 6 studies used no biomarker of body sugar at all.In addition to the traditional biomarkers/ predictors of MetS (i.e.abdominal obesity, blood pressure, blood glucose, triglyceride and HDL-cholesterol), some studies employed other (novel) biomarkers, such as salivary biomarkers (as against blood biomarkers) [46], phenotypic biomarkers (double chin, buffalo hump) [37],Quadriceps muscle peak torque/body mass, (Nm/kg) [38], and lifestyle factors (alcohol, moderate physical activity, smoking, food insecurity, habit of eating less salt, dairy consumption) [36].
Reporting the model equation can add to the reporting quality and replicability of the study.However, the nal model equation was reported in only 15 studies.
Also, one study reported having an online risk calculator, and another one developed a nomogram.

Discussion
This systematic review revealed that there are numerous MetS prediction models or scores in the literature.This nding is similar to what is seen in other chronic diseases such as CVD [15,48,49] and T2DM [13,14,20].There is some level of variability/ diversity regarding the geographical location (origin) of the models in this review.Indeed, no less than a single model has been developed in almost all continents of the world except Africa.With nearly seventy-ve per cent of all chronic disease-mortality occurring in middle and low-income countries (African countries inclusive) [50], this nding calls for serious action by researchers from the continent.Having MetS models from those countries is necessary because it is a known fact that in risk prediction models there are signi cant variations in predictor-outcome association in different ethnicities [13,51].
Poor conduct and reporting of prediction model studies is a common nding reported across most similar reviews [13-15, 20,48,49], and often, this leads to missing of vital information.This nding is also observed in most of the studies included in this review.The lack of standardised guideline both for conduct and reporting of prediction model studies is believed to be largely responsible for this occurrence [10].However, with tools such as TRIPOD [10]and PROBAST [11] being developed and validated, the situation is likely to change in the nearest future.
Assessing the overall performance of prediction models is necessary before translating research ndings into the real-world setting [52].However, model performance can be affected in several ways.One of these ways is how continuous variables are are they retained as continuous measures or are they categorised into two or more categories [53].Often, variables are dichotomised by either using the median value or by choosing an optimal cut-off point (based on the minimum P-value).The practice of treating continuous risk predictors as categorical should be avoided, irrespective of the approach used 53 .Unfortunately, this is frequently observed in risk prediction model studies [12,[54][55][56].In this review, categorisation of some, or all variables was conducted in (65%) of the studies.This nding is in keeping with what is reported previously in similar reviews [13,15,49].By dichotomising continuous there is risk of serious loss of information and statistical power to observe real associations.This effect is comparable to losing a third of the overall data or even more if the data are distributed exponentially [57].Therefore, it is recommended that while developing a model, continuous risk variables (predictors) should be retained as continuous variables, or rather, splines or fractional polynomial functions should be used if the relationship between the predictor and the outcome is nonlinear [58].
Another way that the performance of models is affected is through missing values.Missing values are a common occurrence in most datasets.In fact, collecting all data on all risk predictors for all individuals is a di cult task that is rarely achieved, no matter the study design used [13].Researchers are then faced with the challenge of dealing with the missing values (especially if the study is based on a retrospective cohort data).One of the common methods employed in this regard is complete case analysis, after completely excluding participants with missing data on any of the variables of interest [59].However, this approach is not recommended as it does not only discard useful information; it also leads to biased results and conclusions [23].Nearly half of the studies in this review failed to report information regarding how they treated missing values.This nding is in keeping with other similar reviews [12,13,15,54].One of the potent ways of minimising the effect of missing values is the use of multiple imputation technique.But, this is rarely conducted in studies of prediction models or scores [60].Therefore, researchers should always report the completeness of the overall data and how the missing values are dealt with so that the representativeness and quality of the data can be judged by readers.
Again, there is a lack of consistency in studies of MetS prediction as they used different predictors and statistical methods.To the very least, discrimination and calibration measures are recommended to be reported [61].Although nearly all studies reported some form of discrimination, however, calibration is rarely reported.In this review, only two studies reported any form of calibration measure.This is similar to other relevant reviews [20].This makes it di cult to make comparison across studies, e.g.meta-analysis and to assess the generalizability of the studies [62].
Furthermore, majority of the studies used common biomarkers (blood pressure, fasting blood sugar, cholesterol, triglycerides and waist circumference) as predictors in building their models.In addition to these, however, other novel predictors /biomarkers have been used once or twice by some researchers.However, none of the models that reported using novel biomarkers has been used elsewhere or externally validated.A similar observation is made in CVD models studies [15].This shows that researchers in the eld give more signi cance to the process of identifying new predictors and new model building as against validating and applying existing ones.
Regarding the de nition/ criteria of MetS, there is signi cant heterogeneity amongst the studies.But, the NCEP criteria [63] or its modi ed versions are the most commonly used.This further makes it di cult to compare between studies because different de nitions of outcome result in difference in predictor effect and resultant model performance [15].Having a more uniform de nition/ criteria would help signi cantly in mitigating the above (thereby making it easy to compare between studies and eventually translate research ndings into clinical setting) [10].
When it comes to multivariable model-building strategies, the commonest used to derive the nal model or risk score is automated selection (either forward selection, backward elimination or stepwise) (52% in this review).However, the automated selection strategy is data dependent (rely on statistical signi cance without reference to clinical signi cance.Additionally, this strategy often produces models that are unstable, with estimates that are biased and ultimately leading to poor predictions [64,65]. Furthermore, most of the studies included in this review described developing MetS prediction models, but, the external validation of such models is seriously lacking.Certainly, the ultimate aim of any multivariable model study is to show that the model in question works [13].It is, therefore; of paramount importance that the model performance is assessed once it is developed.Generally, model performance can be assessed in two broad ways (internal validation and external validation) [66].The internal validation is done using such techniques as (in increasing order of evidence): "split sample (in large cohorts), crossvalidation, and resampling (bootstrapping technique)".On the other hand, external validation involves applying the prediction model on a new sample that is entirely different from the developing one [62].Only two models in this review were externally validated.Lack of external validations is a common problem of most prediction model studies [12,13,15,54].
Head-to-head comparison of models assists in knowing which models are better in terms of performance.In this review, no such comparative study is observed.This makes it di cult to choose, or advocate amongst the existing studies.Comparative studies (preferably of multiple models in a single study) are recommended in prognostic risk prediction models [15].However, as signi cant as the statistical characteristics of a prediction model may be, they do not guarantee its usefulness in a clinical/ real-life setting.None of the models in this review is reported to have been applied in clinical setting.Therefore, in the future, more emphasis should be given on impact studies-applying the models in clinical setting and assessing their ability to in uence decision making or patients' outcome.

Figures
Figures

Figure 1 Prisma
Figure 1 Prisma ow diagram describing the selection of studies

Table 1 .
0 Quality assessment of the included studies based on PROBAST *ROB = risk of bias.(+) shows low ROB/low concern regarding applicability; (-) shows high ROB/ high concern regarding applicability; and (?) shows unclear ROB/ unclear concern regarding applicability.

Table 1 .
0 above provided a summary of the quality assessment of the included studies.

Table 2 . 0
Components of three MetS risk models or scores with potential for adaption for use in routine practice BP= blood pressure, HDL= high density cholesterol WC= waist circumference, TG/TAG= triglyceride, FBG= fasting blood glucose

Table 3 .
0 Summary of 24 papers from which 40 MetS risk models or scores were identi ed for systematic review *N/S= Not stated, N/A= Not applicable, IDF= International diabetic federation, NCEP= National cholesterol education panel, cMetS= continuous metabolic syndrome score, MetSS= Metabolic syndrome score, JAMRISK= Japanese metabolic risk score.