Determinants of Generic Drug Use Among Medicare Beneficiaries: Predictive Modelling Analysis Using Artificial Intelligence

Determinants

counterparts [2]. Generic drugs' financial savings to the healthcare system is noticeable [1]. FDA reports that generic medications can cost, on average 80 to 85 percent less to patients than the brandname equivalents [2]. Even though generics share 89 percent of drugs dispensed, it only costs 26 percent of total drug costs in the U.S [1]. In 2016 alone, generics saved $253 billion, while generics under Medicare saved $77 billion ($1,883 per enrollee) [1]. With such savings, it can be better invested in medical research and developing new treatments [1]. A recent study indicated that the use of generic medications was associated with comparable clinical outcomes to the use of branded counterparts, specifically for chronic conditions [3]. Under the Medicare Act, the policy approach is to encourage generic drug use. However, it does also cover branded Promoting generic substitution is known to fetch substantial savings in the Medicare drug benefit program [5]. According to the U.S. Department of Health and Human Services (HHS), the use of generic drugs could have saved $3 billion for the Medicare Part D program in 2016 [6]. Furthermore, if the substitution of generic drugs works program-wide, the Part D could potentially save $5.9 billion a year [6]. CMS reported that although 90% of prescriptions dispensed in the U.S. are for generic drugs, Medicare Part D beneficiaries spent $1.1 billion in 2016 alone on out-of-pocket for branded drugs with generic equivalents [7]. While there are indications that the chance to fill a generic drug is more under the Medicare plans, not much is known about the prevalence of generic drug use and its predictors among Medicare beneficiaries. Knowing the predictors of generic drug use would have policy implications, especially when the Medicare program changes from time to time [8]. The scope of artificial intelligence (AI) has been widely recognized in the US healthcare system and under Medicare [9]. AI integrates the scientific principles of philosophy, mathematics and computer science to understand and develop systems that display and emulate properties of human intelligence [10]. AI is a branch of computer science which enables creation of machines that work and react like human intelligence with training, supervision or automation mode [11]. These machines can complement or replace human intelligence and skills in a healthcare setting. However, AI has been widely discussed as a supporting tool to replace human skills to enhance availability and quality of healthcare through disruptive technology [12]. The potential of AI has not been widely explored for data exploration and research [13].
When it comes to health research, machine learning is the most recognized AI tool. Machine learning uses algorithms and a wide range of statistical models to learn associations of predictive power from examples in data [10]. It has an incredible pattern recognizing ability in big and raw data sets. This identification of patterns helps in knowing healthcare seeking patterns and quality and their determinants in complex healthcare systems. Machine learning thus helps quick decision making without much costs and time.
It should be noted that although machine learning is one of the most tangible manifestation of AI with a wider scope in healthcare research, it is still an emerging concept in health research globally and in the USA [14].

Objectives
The objectives of the study were two-fold. First, using the Medicare Current Beneficiary Survey (MCBS) data, it quantified the national prevalence of generic drug use among Medicare beneficiaries. Secondly, it identified the predictors of generic drug use among such populations through the application of artificial intelligence (i.e. machine learning). In short, this study tried to generate novel evidence on generic drug use among Medicare beneficiaries and also tried to apply machine learning in complex Medicare data for predictive modelling.

Outcome Variable
A binary dependent variable -use of generic drug -was created for the analysis. In the MCBS data, an item "ever asked for generic drug" was collected with three possible responses -"never", "sometimes" and "often". We recoded the responses "sometimes" and "often" to "ever", creating a dichotomous variable for generic drug use to either "ever" or "never". included four categories -"non-Hispanic whites", "non-Hispanic blacks", "Hispanics" and "others". There were three age groupsbelow 65 years, 65 to 75 years, and above 75 years. Marital status consisted of four categories -"married", "widowed", "divorced/ separated", and "never married". Socio-economic predictors were education, annual income, and place of stay. There were three education categories -"less than high school", "high school or vocational, technical, business, etc.", and "more than high school". Annual income was dichotomized creating income below and above $25,000.
Place of stay was also a binary variable of respondents from metro and non-metro regions. Insurance predictors consisted of dual coverage (Medicare and Medicaid), whether plan covered drugs, Part D coverage, and enrollment in Medicare Advantage and private insurance. All insurance predictors were binary variables with "yes" or "no" responses. Number of limitations in activities of daily living (ADLs) was the health status predictor. ADLs are limitations to caring for the self as a result of a health or physical issue. Caring for the self includes activities such as bathing, showering, dressing up, eating, getting in or out of bed or chairs, or using toilets. ADL predictor was coded as three responses -none, one, and two or more. Healthcare utilization predictors were number of outpatient office visits, and number of inpatient stays. Both outpatient office visit and inpatient stay variables were categorized into six responses -"no office visit", "1 to 5 office visits", "6 to 10 office visits", "11 to 15 office visits", "16 to 20 office visits", and "21 or more office visits".

Statistical Methods
Descriptive analyses conducted for the predictors and the sample characteristics were presented by sub-groups under each predictor as weighted proportions. Correlation was tested among all predictors with the Pearson's correlation coefficient.
Enrollment in the Medicare Advantage Plan was highly correlated with outpatient visits (correlation coefficient -0.66) and private insurance (correlation coefficient -0.83). Thus, enrollment in the Medicare Advantage Plan was dropped from the list of predictors.
Bivariable analyses were performed using Rao-Scott tests to demonstrate possible associations between the dependent variable and predictors [16]. Separate Rao-Scott tests were conducted by year cohort (2015 and 2016) and for the pooled cohort. Associations between generic drug use and predictors (demographic, socioeconomic, insurance, and healthcare utilization) were estimated using a multivariable logistic regression model. Associations were considered statistically significant if the p-value was below 0.05. All estimates were weighted by using sample weights to represent the population of all ''ever-enrolled'' Medicare beneficiaries. Artificial intelligence through a machine learning algorithm was used to improve the predictive modelling for the predictors of generic drug use [9]. We used an ensemble model (random forest) to predict the outcome. Random forest is a supervised machine learning algorithm which uses a combination of decision trees [17]. Decision trees consist of recursively partitioning the inputs (predictors).
The algorithm sequentially fits new attributes to predict the output. In our model, an ensemble of 501 decision trees was used and trees were extended up to a maximum depth of 10. First, the random forest model was trained on 80% of the observations and was validated on the remaining 20% of observations for predictive strength. A ten-fold cross-validation of the data was performed where the data was split into 80% training and 20% test observations randomly ten times, and the average of these ten splits was taken as the final prediction estimate. We tested three random forest models on the pooled cohort sample based on different variable selection. In the first model, generic drug use was  [18,19].

Participants
As shown in the Table 1   achieved the best balance between the sensitivity (92.3%) and specificity (13.7%) values. This model also outperformed other two models in terms of accuracy (62.8%) and AUC (58.2%). Figure 1 presents the ROC and AUC for these three models. In the combined model, number of outpatient visits, marital status, race, education and age were the most important predictors for generic drug use ( Figure 2).

Discussion
Our study showed that the odds of generic drug use was relatively higher among Medicare beneficiaries who were below 65 years, non-Hispanic whites, education above high school, married, without dual coverage, without private insurance, with Part D coverage, having more than two limitations in activities of daily living, and more than 20 outpatient office visits. In the 2015 cohort alone, being a male and having lower annual income (<$25,000) were also associated with a higher chance of generic drug use unlike in the 2016 cohort. The pooled cohort had similar associations to the 2015 cohort excluding gender predictor. In the predictive analysis using machine learning, number of outpatient visits, marital status, race, education and age were the most important predictors for generic drug use in the pooled cohort. A recent study also showed dispensing of generic drugs is consistently high (74%) among Medicare beneficiaries compared to the commercial beneficiary population in the last few years. [20] Also, existing evidence reflected a higher proportion of generic drugs use, especially for chronic conditions among white populations [20]. There are also indications of increased healthcare utilization, especially for annual wellness visits among non-Hispanic whites [15]. In contrast to our findings, a recent study found high generic drug use among older adults, specifically for chronic health conditions (e.g., thyroid disorders) [20]. Our study showed this probability to be higher among adults under 65 years. We did not observe this trend as we did not examine the drug dispensing patterns for various health conditions. It is true that the nature of disease and health condition could be a driver of dispensing drugs [21]. More generic drug use among adults below 65 years compared to above 65 groups perhaps could be due to medical necessity for prescription drugs [20].
Similar to our findings in the 2015 cohort, another study also found that the use of branded drugs was relatively less among males [21]. This gender difference in dispensing generic drugs was not observed in the 2016 cohort, indicating that perhaps the awareness and need for cost-effective generic drugs have spread eventually to both genders under the Medicare program [1]. Prevailing evidence confirms our findings that beneficiaries with the Part D coverage are more likely to avail generic drugs [22]. One of the reasons could be prescription formulary benefit design targeting increased the use of low-cost generic drugs under the Part D coverage [23].
Additional policy measures such as not increasing generic drug price and wider availability of generic drugs, including fast tracking generic drug applications will further ensure this reliance on generic drugs under other components of the Medicare program.
Typically, if healthcare utilization involves a higher out-of-pocket expenditure on branded drugs, only higher income groups will be more inclined to avail care and branded drugs [15].
Our study also found that generic drugs use was higher among lower income groups. Lower affordability could be a reason for a direct association between number of outpatient visits and generic drug use in the study. Lower income groups have better affordability for generic drugs and generic drugs have better patient compliance, especially among low income groups [1]. Around 20 percent of brand-name prescriptions are abandoned, compared to 7.7 percent of generics among approved claims in 2016 under the Medicare program [1]. Since generic drugs are more cost saving, policy strategies are needed to encourage generic drugs use even among higher income groups under the Medicare program. Although health conditions can be predictors of generic drug use, this study did not consider health conditions [24]. CMS asks respondents "did you ever diagnose with a specific health condition?"

Conclusion
This study finds that socio-economic and demographic variables along with insurance characteristics play a significant role in the chance and level of generic drug use under the Medicare program.
Policy strategies to encourage generic drug use among higher income groups, non-Hispanic Blacks, less educated beneficiaries, private insurance holders and part D Medicare coverage may be relevant. Machine learning could be applied further to understand predictors of generic drug use and other health parameters in complex big and raw data in the USA and elsewhere.

Author contributions
AKD, HB and SSG conceptualized the study design, analyzed data and drafted the manuscript. All authors finally agreed to the final version.

Conflict of Interest
None declared by the authors. Views expressed in the paper are that of the authors and do not necessarily reflect that of their organizations.