Discovering Co-Occurring Medical Complications For Cocaine Users In African American Diabetic Kidney Patients: A Text Mining Method

The use of cocaine exacerbates kidney disease, and African Americans are more likely to use cocaine. This study explored the variety of medical problems African American diabetic kidney patients experience


2/6
(CKD) [21][22][23]. Smoking especially accelerates the progression of CKD in diabetic patients21 and is linked to high blood pressure (BP) [24]. Collectively, these reviewed studies above consistently reported positive correlations between the use of cocaine and high BP; however, it is not clear whether systolic BP (SBP), diastolic BP (DBP), or both BPs are elevated. Recent scholars claimed that SBP measure is more closely associated than DBP with development of kidney disease and cardiovascular diseases among type 2 diabetic (T2D) patients [25][26][27]. SBP is the main factor in determining hypertension [28], and achieved DBP is not correlated with progression of kidney disease [27].
Another report shows that 95% of patients with uncontrolled hypertension had elevated SBP levels, while DBP was elevated in only 50% of patients [29]. Taken together, those findings are pointing to SBP as a better predictor for ESKD than DBP [28,[30][31][32][33]. Although existing studies have improved our understanding of the relationship between cocaine use and its consequential medical problems, they have some problems. First, although the purpose of studies is to discover the consequential medical problems from cocaine, those studies didn't subset only cocaine users by including tobacco users, which resulted in confounding effects within the finding. This may be the reason that consequential medical problems between sole cocaine use and use in combination with tobacco are similar. Second, studies consistently report that cocaine elevates BP as a whole. On the other hand, some other research shows that SBP is a more efficient predictor for ESKD than DBP. Again, since tobacco users are in the sample, this finding may have confounding effects. Furthermore, SBP and DBP may be differently associated with various medical problems caused by cocaine and tobacco use. Therefore, the purpose of this research is to investigate three queries using a machine learning technique. First, what kinds of co-occurrent medical problems do diabetic kidney patients have if they use only cocaine? Second, what kinds of co-occurrent medical problems do diabetic kidney patients have if they use both cocaine and tobacco? Third, how do these co-occurrent medical problems caused by the combination of cocaine and tobacco uses associate differently with high SBP and DBP? Because African Americans are more likely to use cocaine, the focus of this study will be given to this population group.

Data and Data Processing
The data used in the study was obtained from the Cerner HealthFacts® Data Warehouse (CHFDW). The data extraction was based on the International Classification of Diseases (ICD) 9 encoding. The CHFDW is a completely Health Insurance Portability and Accountability Act (HIPAA) compliant data repository. Because this dataset is already collected by a third party, Institutional Review Board (IRB) for the Protection of Human Subjects has determined that this study did not meet the criteria for human subject research. Because the purpose of the paper is to discover frequently appearing co-occurrent medical problems among African American diabetic kidney patients who use cocaine, T2D patients were extracted based on International Classification of Diseases, Ninth Revision (ICD-9, 250.00-250.93). They were then further filtered by those who were outpatients, African Americans, kidney patients (GFR stages between 3 and 5 including kidney patients), and cocaine users. The filtered patient records are as follows: total T2D records were 1,038,499, all kidney patients were 25,480, African American kidney outpatients and cocaine users were 439. The process of the data extraction is shown in Figure 1. Cocaine use is based on the urine test, note that the dataset combines tobacco and smoking, we used the term "tobacco." Overweight included overweight, obesity and extreme obesity because of low counts of obesity and extreme obesity. Cough was combined to include cough, cough more than 3 weeks, and cough & deep breath. It is possible that a patient could have all three symptoms.

Analytical Strategy
Apriori was chosen as the analytical strategy for this research, as this machine-learning technique can efficiently discover frequently co-occurring medical problems. Apriori is an unsupervised machine-learning algorithm that searches all the variables in the database and retrieves only frequently appearing itemsets. As such, it does not require prior knowledge or pre-selected variables but uncover frequently appearing medical comorbidities with cocaine use. As such, this machine learning technique is ideal to discover new knowledge. Note that frequently appearing itemsets can be subjective, and thus this technique allows researchers to set the selection criteria using support, confidence, and lift values as well as use them as research finding validation: support is the fraction of transactions that contain an itemset; confidence is an accuracy of the item likely to be in the transaction; lift is a likelihood of XX and YY occur together than expected if they were statistically independent. Lift greater than 1 indicates complementary items and less than 1 indicates substitutes. Because these criteria are used to limit findings, they can control the exponential growth of the finding, and thus this method is popular among data scientists. Because this analytical strategy is widely used in marketing, it is often referred to as market basket analysis. In marketing, it is to discover what item a customer is likely to buy together, if the customer purchases milk, for example.

3/6
Since this dataset has 3,194 medical problems, support is set to 0.00035. Confidence measures how confident it is once certain medical problems occur with the use of cocaine, this study is set to 100%. Using these criteria, the machine learning technique used 1,013 frequently appearing medical problems out of 3,194, and discovered 2,203 rules for the male group, while zero rules were discovered for the female group. It may be because the use of cocaine is higher among African American men than women12,34. This finding is consistent with another study that reported about 76% (37 of 49 subjects) of cocaine users were men while only 24% were women4. Table 1 shows five examples of the discovered results. The first rule shows that cocaine users have the characteristics of tobacco, overweight, high creatinine, high SBP, and respiratory problems. The lhs (left-hand side) and the rhs (right hand-side) are simply a correlation, not a causation, meaning that those six medical problems co-occur. Because the lhs side variables include tobacco and overweight, it makes more sense to interpret if diabetic kidney patients are overweight, and use both cocaine and tobacco, the co-occurring medical problems are high creatinine, high SBP, and high respiratory issues. The support was 0.0000376, meaning that a chance to co-occur the set of these six medical problems out of 1013 medical problems is 0.0000376, but the probability is 100%, and the possibility for these six items to co-occur together is 4,285 times higher than the possibility of them occurring independently.

Co-occurring medical problems among cocaine without tobacco
In order to address potential threats of confounding effects on the findings from including cocaine and tobacco users in the same analysis, we only included sole cocaine users in the first analysis and discovered 2,203 rules. If all these rules are reported individually, one cannot find systematic patterns of co-occurring medical problems, instead it becomes meaningless. Because the purpose is to discover frequently co-occurring medical problems amid cocaine users, the most frequently co-appearing set of medical problems are exclusively filtered while removing medical problems appearing below 10%.
Consistent with existing studies, this research also found high creatinine, cardiovascular issues, elevated blood pressure, and respiratory complications [6][7][8][9][10][11][12]. Newly discovered findings are that cocaine use elevates SBP more than DBP. In fact, high SBP is the highest medical problem among cocaine users [34]. Although this dataset has been filtered to kidney patients only, the high BNP problems occur because the use of cocaine can damage the heart as well [35].Additional noteworthy findings are cough, high prothrombin, low red blood cell count (RBC), night sweats, fever, low white blood cell count (WBC), and low neutrophil. It seems that low RBC is from alcohol rather than from cocaine use because chronic alcohol consumption leads to deficiency of various vitamins, and one of them is folate deficiency, which disrupts the process of hematopoiesis and which in turn leads to low RBC [36,37]. Deficiency of folate leads to anemia, which is again common among alcoholics [38]. Although high occurrences of coughing, high respiratory problem, and respiratory problem with cocaine use, sparse research has done to investigate those relationships. Although studies show that cocaine and tobacco users have asthma and chronic obstructive pulmonary disease (COPD)20 and respiratory problems [39] those findings include tobacco users. The finding further shows that being over-weighted patients are more likely to use cocaine and alcohol. Tobacco use appeared 24.19% among cocaine users in this study. In this section, we further drilled down the data to discover the complications from both cocaine and tobacco use. The questions asked are: "what if diabetic kidney patients use both cocaine and tobacco, what kinds of medical problems are they likely to have?" Based on these criteria, 703 rules are discovered. Figure  2 shows that high respiratory problems become the number one medical problem (36.27%) from 4 th (21.80%) in Figure 3, and high BNP-B also jumped to 2nd from 6 th in Figure 2. Clearly, tobacco use exacerbates respiratory problems as well as damages the heart. This finding also shows that SBP is elevated more than DBP. It is interesting though that when patients use tobacco, they tend to have high WBC to fight against the inflammation and damage caused by tobacco; however, this study found low WBC in conjunction with the use of tobacco and cocaine, which is paradoxical; however, a

4/6
study shows that the use of tobacco decreased RBC and WBC [40]. Another study found that WBC count is inversely associated with alcohol consumption among both nonsmokers and smokers [41]. As such one could speculate that low WBC in this finding may have been related to alcohol, but further investigation is required.  Another notable finding is that overweight and alcohol have increased from Figure 2, suggesting that overweight diabetic kidney patients are likely to use multiple substances. Unexpectedly, cardiopulmonary has dropped to 9th (14.94%) from 5th (21%) in Figure 2. An existing study reported that smokers have a higher rate of survival than non-smokers after cardiopulmonary resuscitation (CPR), which is referred to as smoker's paradox [42]. Co-occurring medical problems, high SBP versus DBP among cocaine users. Scholars claimed that high SBP is the more pressing issue, especially concerning kidney patients [25][26][27]. The finding of this study shows that the importance of SBP vs. DBP are contingent upon the medical problems as shown in Figure 4. Cough is the worthiest problem of note in this finding: if diabetic kidney patients use both cocaine and tobacco and if they cough, they are likely to have both high SBP and DBP, although SBP is slightly higher (41.23% vs. 38.93%). High SBP is highly associated with high BNP and low WBC, while high DBP is correlated with high respiratory, night sweats, low RB, high respiratory, overweight, and shortness of breath. Especially, if overweight diabetic kidney patients use cocaine and tobacco, they are likely to have high DBP (25.00%) over SBP (13.37%). The same findings are observed patients with shortness of breath (19.47 vs. 10.30) and high respiratory (21.52 vs. 13.50). Although this finding shows that SBP and DBP have different degrees of association with various medical problems among African American diabetic kidney patients, DBP has higher impacts than SBP on various consequential comorbidities from cocaine and tobacco use. Lastly, different impacts of SBP and DBP are not clearly observed for high prothrombin, high creatinine, and cardiopulmonary.

Discussion
Using an apriori machine learning technique, this study investigated what kinds of co-occurring medical problems African American diabetic kidney patients will have if they only use cocaine, this study further drilled down to investigate the effect of combined tobacco and cocaine use and finally, how SBP and DBP are associated with these discovered medical problems from the use of tobacco and cocaine. The contributions of this study are as follows: Firstly, this study teased out cocaine users from tobacco users and ran analysis separately to observe independent effects of cocaine use on medical comorbidities. This finding showed that cocaine users have high SBP, cough, high prothrombin problems, among others. Secondly, in order to investigate the effects of using both tobacco and cocaine, the next analysis combined these two groups. The finding reveals that high respiratory and high BNP problems become top medical problems followed by high SBP. Third, this study investigated how SBP and DBP are differently afflicted with the medical problems derived from cocaine and tobacco use. While high SBP is associated with high BNP and low WBC, this study clearly shows that DBP has more impact on a variety of medical problems.
More specifically, being overweight, having shortness of breath and high respiratory problems are much more likely to be associated with high DBP than SBP. This finding is somewhat contradictory with existing studies, but we speculated such contradictions that our study categorized various medical problems and investigated each medical problem with SBP and DBP while existing studies investigated a single correlation such as: a correlation between cocaine and BP, or tobacco and BP, or ESKD and BP. Lastly, unlike a cohort study or traditional research method that relies on existing studies, apriori does not assume prior knowledge, but searches all data and discovers most frequently co-occurring medical problems and discover new knowledge. Using this method, this study discovered new medical issues such as low RBC, WBC, cough, and high prothrombin which are not well recorded in existing studies. This study, however, has some limitations. First, the findings are based on the CHFDW, although it is arguably the largest electronic medical records (EMR) data, if diabetic patients are not in this database, they are not included in this analysis. Second, we used the 2012 diabetic patients, which was the most recent complete dataset in 2016. Although we don't anticipate patients' co-occurring medical problems abruptly year to year, we still caution the reader.

5/6
For future studies, it is recommended to include medications to observe whether gender or racial background affects the medications taken for the same medical issues. Furthermore, including medications into the analysis may be able to answer the low white blood cell count problem. To validate the finding, metastudy is recommended.