Olobatuyi Kehinde Ibukun1*, Asiribo Osebekwiin Ebenezer2 and Talabi Olusola Adetunji3
Received: October 19, 2022; Published: October 31, 2022
*Corresponding author: Olobatuyi Kehinde Ibukun, University of Milano-Bicocca, Milan, Italy Department of Statistics, Italy
DOI: 10.26717/BJSTR.2022.46.007423
We study the properties of the called log-generalized gamma Burr III distribution defined by the logarithm of the generalized gamma Burr III random variable (Olobatuyi, et al. [1]). An advantage of the new distribution is that it includes as special sub-models classical distributions reported and has the ability to model unimodal HFs. We obtain formal expressions for the moments, moment generating function, quantile function and mean and median deviations. We constructed a regression model based on the new distribution to predict relief time of headache patients and death of breast cancer patients treated by mastectomy. It can be applied to censored data since it represents a parametric family of models that includes as special sub-models several widely known regression models. The regression model was fitted to a data set of 1207 eligible breast cancer patients. We predict survival probability after the mastectomy in terms of highly significant clinical and pathological explanatory variables associated with the death of the patients. The predicted probabilities of survival are calculated under two nested models.
Keywords: Generalized Gamma Burr III Distribution; Censored Data; Log- Generalized Gamma Burr III Distribution; Log- Generalized Gamma Burr III Regression Model; Survival Function
Standard lifetime distributions usually present very strong restrictions to produce bathtub curves, and thus appear to be inappropriate for interpreting data with this characteristic. Some distributions were introduced to model this kind of data, as the exponential power family [2], the beta integrated model [3], and the generalized log-gamma distribution, among others. A good review of these models is described, for instance, in [4] In the last decade, new classes of distributions for modeling this type of data based on extensions of the Weibull distribution were developed. For example, the exponentiated Weibull (EW) (Mudholkar, et al. [5]), the additive Weibull [6], the modified Weibull (Lai, et al. [7]), the beta Weibull (BW) (Famoye, et al. [8,9]) and the generalized modified Weibull (Carrasco, et al. [10]) distributions. Further, (Carrasco, et al. [10]) investigated several mathematical properties of the BW geometric distribution, which is a highly flexible lifetime model to cope with different degrees of kurtosis and asymmetry. The Generalized Gamma Burr III (GGBIII) distribution, due to its flexibility in accommodating the different types of the risk function depending on its parameters, can be used in a variety of problems in modeling survival data. The main motivation for the use of the GGBIII model is that it contains as special sub-models several distributions such as the generalized gamma Fisk, Zografos and Balakrishnan-Burr III, Zografos and Balakrishnan-Fisk, Burr III distribution among others. Also, it was reported that gamma model is the most effective model for analyzing highly skewed data such as survival data, [11].
Breast cancer presents a major risk to American women, who have a 1 in 8 lifetime chance of developing the disease. The estimated incidence of invasive breast cancer in the United States for 2010 was 207,090 women, making it the most common cancer after skin cancer in women. Although survival has improved because of advances in treatment and early diagnoses as a result of the increased use of mammographic screening, fatalities in 2010 have been put at 40,000. Mastectomy is surgery that removes the entire breast. All the breast tissue is removed, sometimes along with other nearby tissues. If just the breast is removed (and not lymph nodes under the arm) it is called a simple (or total) mastectomy. A simple mastectomy combined with an axillary lymph node dissection (discussed below) is called a modified radical mastectomy. The primary end point was survival (DFS), defined as time to the earliest of either death (all-cause), or last follow-up. The follow-up period was defined as time to the earliest of either death (all-cause), or last follow-up. For the first time, we propose a log-generalized gamma Burr III regression model to predict the π‘ months survival probability after mastectomy in terms of highly significant clinical and pathological variables associated with death of the patient after surgery. The study cohort comprises 1207 patients with clinically localized cancer treated by mastectomy. The data consist of the random response variable given by the number of months (π¦π) after mastectomy. Uncensored observations correspond to patients having death time computed. Censored observations correspond to patients who were not observed to have died at the time the data were collected. The numbers of censored and uncensored observations are 1135 and 72, respectively, of the total of 1207 patients.
In this article, we propose a location-scale regression model based on the LGGBIII distribution, referred to as the LGGBIII regression model, which is a feasible alternative for modeling the existing types of failure rate functions. Some inferential issues were carried out using the asymptotic distribution of the maximum likelihood estimators (MLEs). The sections are organized as follows. In Section 2, we define the LGGBIII distribution. Mathematical properties of this distribution are investigated in Section 3. In Section 4, we obtain the order statistics. We propose a LGGBIII regression model for censored data and discuss inferential issues in Section 5. In Section 6, a breast cancer data set is analyzed to show the flexibility, practical relevance and applicability of our regression model. Section 7 ends with some concluding remarks.
Generalized Gamma Burr III Distribution
Most generalized Burr III distributions such as Beta Burr III distribution Antonio and Silva (2014) have been proposed in reliability literature to provide better fitting of certain data sets than the traditional two and three parameter Burr III models. The GGBIII density function (Olobatuyi, et al. [1]) with five parameters πΌ > 0, π½ > 0, πΏ > 0, π > 0 and π > 0 is given by (π‘ > 0)
where Ξ(. ) is a gamma function. Here, Ξ± and k are two additional shape parameters to the Burr III distribution to model the skewness and kurtosis of the data. The important characteristic of the GGBIII distribution is that it contains as special sub-models. The hazard and survival rate functions corresponding to (1) are
Shape of GGBIII Distribution
Plots of the density function of the Generalized Gamma Burr III distribution for selected parameters values are given in Figure 1. The plot indicates that the GGBIII distribution can be decreasing or right skewed.
The Log-Generalized Gamma Burr III Distribution
In this section, log-generalized gamma Burr III distribution is introduced. It is based on the logarithm of the continuous GGBIII distribution that is presented above. The log-generalized gamma Burr III distribution is proposed and denoted as LGGBIII. Some of its mathematical properties are studied, estimation by the method of maximum likelihood is discussed, and applications to two real datasets are described. The new distribution is shown to outperform at least two models which are the log-ZBD and Cox model. Let π be a random variable having the GGBIII density function (1), The random variable π = log(π) has a log-generalized gamma-Burr II(LGGBIII) distribution, whose density function is reparametrized in
Where β β < π¦ < β, π > 0, and ββ < π < β , π > 0, πΌ > 0, π½ > 0. We refer to the new model (4) as the LGGBIII distribution, say πΏπΊπΊπ΅πΌπΌπΌ(πΌ, π½, π, π, π ) where π is a location parameter, π is a dispersion parameter and πΌ, π½ and π are shape parameters. The following results hold if π βΌ πΊπΊπ΅πΌπΌπΌ(πΌ, π½, π, πΏ, π) then π = πππ(π) βΌ πΏπΊπΊπ΅πΌπΌπΌ(πΌ, π½, π, π, π ). The standard random variable π = π β π βπ with density function is defined as
The special case of the model lead to a standard log-ZBD (LZBD new) distribution for π = 1. For π½ = π = 1, we obtain the log-Zografos and Balakrishnana Fisk (LZB-F new). The survival functions corresponding to (4) and (5) are
Expansions of Density Functions
If a random variable π has the LGGBIII density, we say π~πΏπΊπΊπ΅πΌπΌπΌ(πΌ, π½, π ). Let π’ = (1 + πβπ§)βπ½, and then using the series representation from Gradshteyn and Ryzhik (2000).
Shape of LGGBIII Distribution Function
The plot (4) in Figure 1 for selected parameter values show great flexibility of the density function in terms of the parameters Ξ±, Ξ², π in Figure 2a Ξ² = 3, π = 2 and in Figure 1b, Ξ± = 2.5, π = 2 and in Figure 1d, Ξ± = 2.5, Ξ² = 3.
LGGBIII Quantile Function
We now give an expansion for the quantile function π = πΉβ1(π) (given k) of the LGGBIII distribution. First, we have k = F(q). It is possible to obtain as function of p from some expansions for the inverse of the gamma incomplete function ππΊπΊπ΅πΌπΌπΌ(π) = π, 0 < π < 1.
Moments, Moment Generating Function Mean and Median Deviations
In this section, we present the moments, moment generating function, mean and median deviations for the GGBIII distribution.
Moments and Moment Generating Function
As with any other distribution, many of the interesting characteristics and features of the LGGBIII distribution can be studied through the moments. Let π½β = π½(ππΌ + π + π + π), and π~ πΏπ΅πΌπΌπΌ(π, π, π½β). Then the ππ‘β moment of the random variable π is
Mean and Median Deviations
Mean Deviation: If Y has the LGGBIII distribution, we derive the mean deviation about the mean π by
Median Deviation: If π has the LGGBIII distribution, we derive the median deviation about the median π by
Order Statistics of LGGBIII Distribution
Order statistics make their appearance in many areas of statistical theory and practice. The density f i:n( x) of the ith order statistic
The Log-Generalized Gamma Burr III Regression Model
In many practical applications, the lifetimes are affected by explanatory variables such as the cholesterol level, blood pressure, weight and many others. Parametric regression models to estimate univariate survival functions for censored data regression problems are widely used. A parametric model that provides a good fit to lifetime data tends to yield more precise estimates of the quantities of interest. Based on the LGGBIII density function, we propose a linear location-scale regression model linking the response variable π¦π and the explanatory variable vector ππ = (π£π1, β¦ , π£ππ) as follows
where the random error π§π has density function (40), πΈ = (π΅1, β¦ , π΅π), π > 0, π > 0, πΌ > 0, π½ > 0 are unknown parameters. ππ = πππΈ is the location of π¦π . The location parameter vector π = (π1, β¦ . , ππ) π is represented by a linear model π = π½πΈ, where π½ = (ππ, β¦ , ππ) π is a known model matrix. Let πΉ and πΆ be the sets of individuals for which π¦π is the log-lifetime and log-censoring, respectively. The log-likelihood function for the vector π½ = (πΌ, π½, π , π, πΎπ)πof parameters from model (39) has the form
where π(π¦π) is the density function (4) and π(π¦π) is the survival function (5) of ππ. The log-likelihood function for π½ reduces to
The MLE π½Μ of the vector π½ of unknown parameters can be calculated by maximizing the log-likelihood (46). We use the subroutine NLMixed in SAS to calculate π½Μ. Initial values for and can be taken from the fit of the log-Zog Fisk (LZFisk) regression model with π½ = π = 1. The fitted LGGBIII model gives the estimated survival function of Y for any individual with explanatory vector x
The approximate multivariate normal distribution ππ+5(0, π(π½)β1) for π½ Μ can be used in the classical way to construct approximate confidence regions for some parameters in π½. We can use the likelihood ratio LR statistic for comparing some special sub-models with the LGGBIII model. We consider the partition π½ = (π½π», π½π») π»where π½ is a subset of parameters of interest and π½ 2 is a subset of remaining parameters. The LR statistic for testing the null hypothesis π―π:π½1=π½1(0) versus the alternative hypothesis π―π:π½1β π½1 (0) is given by π=2{β(π½Μ)ββ(π½Μ)} where where π½ Μ and π½Μ are the estimates under the null and alternative hypotheses, respectively. The statistic π is asymptotically (as πββ) distributed as πβ 2 where β is the dimension of the subset of parameters of interest.
Predicting the Time to Headache Relief
Thirty-eight patients are divided into two groups of equal size, and different pain relievers are assigned to each group. The outcome reported is the time in minutes until headache relief. The variable censor indicates whether relief was observed during the observation period (censor = 0) or whether the observation is censored (censor=1).
The variables involved in the study are
β’ π‘π β survival time to Headache relief (in minutes);
β’ ππππ π β censoring indicator (0 ππ 1);
β’ ππππ’ππ β (π = 1,2);
Now, by fitting the model
π»0: πππ 1 π‘ πππ = πππ 2π‘πππ π£π π»1: πππ 1 π‘ πππ < πππ 2 π‘ πππ The random variable π§π follows the LGGBIII distribution (5) for π = 1, β¦ ,38. We are interested in modelling which group recovers faster. The MLEs of the model parameters are calculated using the procedure NLMIXED in SAS. Iterative maximization of the logarithm of the likelihood function (46) starts with initial values for π½0 = π½1 = 1, πΌ = 1, π½ = 1, π = 1, π = 1, to fit the regression model. Table 1 lists the Maximum Likelihood Estimation of the model parameters. The value of Akaike Information Criterion (AIC), Corrected Akaike Information Criterion (AICc), and Bayesian Information Criterion (BIC) statistics are smaller for LGGBIII regression model. A comparison of the new model with one of its sub-models which is LZB-D model and most useful model for fitting survival data which is Log-Weibull (LW) model using LR statistic is presented in Table 2 together with p-values. From the values of these statistics, LGGBIII distribution provided a good fit for this data. The LGGBIII regression model outperforms the other models irrespective of the criteria and it can be used effectively in the analysis of these data. So, the proposed model is a great alternative to model survival data. The model to know which group has a high rate of Headache relief that is the group that has a short time to Headache relief was fitted, recall that log(π) = π then, since π = π£ππ° say log(π) = π0 β π1 Γ (ππππ’π β 2) as the mean of the π¦π . Now, from the result provided in Table 3 then substituting the value of π0 and π1 into the regression model as follows. For group 1
These probabilities calculated at the observed times are shown
for the two groups in Table 4.
Since the slope estimate is negative i.e. π1 = β0.1265 , ππΈ =
0.02272 then π‘ value = β5.57 with p-value of < 0.0001, it was seen that group 1 has a shorter time to Headache relief than group 2 that
is pain reliever 1 leads to overall significantly faster relief, but the
estimated probabilities give no information about patient-to-patient
variation within and between groups. For example, while pain
reliever 1 provides faster relief overall, some patients in group 2
may respond more quickly than other patients in group 1. Provided
below in Figures 3 & 4 is a graphical representation of the predicted
value for the two groups.
Predicting Time-to-Death of Breast Cancer Patient
The study cohort comprises 1207 patients with cancer treated by mastectomy. The data consist of the random response variable given by the number of months (π¦π) after mastectomy. Uncensored observations correspond to patients having death time computed. Censored observations correspond to patients who were not observed to have died at the time the data were collected. The numbers of censored and uncensored observations are 1135 and 72, respectively, of the total of 1207 patients. The following explanatory variables were associated with each patient (for π = 1, β¦ . , 1207):
β’ πΏπ βΆ is the event indicator where 1 represents the event (death)
and 0 is censored;
β’ πππ‘βπππ‘: is the size of the pathologic tumor
β’ ππ π‘: is the estrogen receptor status (0=negative, 1=positive, unknown= 2)
β’ ππ: is the progesterone receptor status (negative = 0, positive =
1, unknown = 2)
β’ πππ‘βπππ: is the tumor size category in centimeter (0, <=2, 2-5, >5)
β’ πππ: is the lymph node involvement (0 =no, 1= yes)
β’ βπ: is the histologic grade (grade 1, grade 2, grade 3, unknown = 4)
β’ πππππ : is the positive axillary lymph node. Now, we present the result by fitting the model
where the dependent variable π¦π follows the LGGBIII density function (4) for π = 1, β¦ , 1207. The MLEs of the model parameters was calculated using routine NLMIXED in SAS package. Iterative maximization of the logarithm of the likelihood function starts with initial values for and taken from the fit of the L regression model with π = 1. Tables 5 & 6 lists the MLEs of the parameters for the LGGBIII and LZBD regression models fitted to the current data. The LR statistic for testing the hypothesis that π»0 : p = 1 versus π»1: π»0 is not true, i.e., to compare the LGGBIII and LZBD regression models, is π€ = 2{542.5 β 524.8} = 17.70 (p-value < 0.0001), which gave favorable indications toward to the LGGBIII model. A comparison of the new model with its sub-model using AIC, AICc, and BIC criteria was performed in Table 7. From the values of these statistics, LGGBIII distribution provides a good fit for these data. The fitted LGGBIII regression model indicates that not all explanatory variables are significant at 5%. [13] proposed a very useful regression model for analyzing censoring failure times, where the random variable of interest represents failure time and the failures times are assumed identically distributed in some specified form. He noted that if the proportional hazards assumption holds (or, is assumed to hold) then it is possible to estimate the selected parameter(s) without any consideration of the hazard function (non-parametric approach). This approach to survival data is called proportional hazards model. The Cox model may be specialized if a reason exists to assume that the baseline hazard follows a parametric form. In this case, the baseline hazard can be replaced by a parametric density. Typically, we can then maximize the full likelihood which greatly simplifies model fitting and provides interpretability at the cost of flexibility. From the result of the regression model, it was observed that not all the explanatory variables are significant that is size of the pathologic tumor, positive auxiliary lymph node, and the tumor size category in centimeter are significant at 5% and they are used to predict the survival time of the patients. Based on these three significant variables, MLEs for this new model were presented [14,15].
Table 5: MLEs of the parameters for the LGGBII and LZBD regression models fitted to the Breast Cancer data.
From Table 9, we have πΌ = 1.499, π½ = 1.734, π = 0.156, πΏ = 1/ πΏ = 4.425,
So, the survival probability of this patient say π‘ = 200 ππππ‘βπ by inserting the parameters in equation above we have
That is the patient 1113 has chance of 46% of surviving past these months. Evidently, the survival probability converges to zero when the linear predictor ππ = exp(πππ°) tends to ββ and converges to one when the linear predictor goes to +β. In other words, the death of patients with breast cancer treated by mastectomy for a fixing time π‘ after the surgery, approaches one (zero) when the linear predictor increases to a very large negative (positive) number. We consider ten hypothetical patients who underwent mastectomy having fixed values for the explanatory variables given below
A new class of generalized Burr III distribution called the generalized gamma-Burr III distribution was proposed and studied. The GGBIII distribution has the family of Zografos and Balakrishnan distribution as special cases. The density of this new class of distributions was expressed as a linear combination of Burr III density functions. The GGBIII distribution was established to possess hazard function with flexible behavior. We also obtained closed form expressions for the moments, mean and median deviations, and distribution of order statistics. Maximum likelihood estimation technique was used to estimate the model parameters. To further the previous work, in this work, we built on the log transformation of GGBIII distribution proposed in our previous work. The main motivation was to predict the survival probability of breast cancer patients after surgery called mastectomy. The name of the log transformation is Log-Generalized Gamma Burr III (LGGBIII) model. We compared the proposed log-transform model with existing models such as Log-Zografos-Balakrishnan model Log- Weibull model and Cox model. Before the prediction, it has been established in the previous work that GBBIII distribution fitted better than its competitors. This work only confirms our curiosity in predicting time β to β death of breast cancer patients. We randomly selected 10 patients and calculated their survival probability which was also visualized in the figure above. Additionally, we analyzed two groups of patients with headaches. Also, predicted the time β to β headache relief as shown in the table. In general, the proposed LGGBIII model has higher predictive power compared to its competitors as established by different goodness of fit tests.
The authors would like to express their sincere gratitude to all those who have contributed in one way or the other towards the success of this paper.