Machine learning of big data in gaining insight into successful treatment of hypertension

Abstract Despite effective medications, rates of uncontrolled hypertension remain high. Treatment protocols are largely based on randomized trials and meta‐analyses of these studies. The objective of this study was to test the utility of machine learning of big data in gaining insight into the treatment of hypertension. We applied machine learning techniques such as decision trees and neural networks, to identify determinants that contribute to the success of hypertension drug treatment on a large set of patients. We also identified concomitant drugs not considered to have antihypertensive activity, which may contribute to lowering blood pressure (BP) control. Higher initial BP predicts lower success rates. Among the medication options and their combinations, treatment with beta blockers appears to be more commonly effective, which is not reflected in contemporary guidelines. Among numerous concomitant drugs taken by hypertensive patients, proton pump inhibitors (PPIs), and HMG CO‐A reductase inhibitors (statins) significantly improved the success rate of hypertension. In conclusions, machine learning of big data is a novel method to identify effective antihypertensive therapy and for repurposing medications already on the market for new indications. Our results related to beta blockers, stemming from machine learning of a large and diverse set of big data, in contrast to the much narrower criteria for randomized clinic trials (RCTs), should be corroborated and affirmed by other methods, as they hold potential promise for an old class of drugs which may be presently underutilized. These previously unrecognized effects of PPIs and statins have been very recently identified as effective in lowering BP in preliminary clinical observations, lending credibility to our big data results.

chronic kidney disease. In addition to lifestyle modifications, it is typically treated with 1 or more classes of medications, which include thiazide-diuretics, calcium channel blockers, angiotensin converting enzyme inhibitors, and angiotensin receptor blockers. Recommendations regarding the use of beta blockers are not consensual, with some guidelines recommending their use, other do not, and some only if other medications have failed to control blood pressure (BP). 2 Expert groups in all parts of the world are continuously reviewing the evidence accumulating on success of hypertension therapy. While there is no consensus on the best initial treatment, it is widely recognized that most patients need medications from more than 1 class, but recommendations in treatment guidelines are heterogenous.
The existing recommendations are based on the available evidence, stemming mostly from RCTs and systematic reviews of RCTs.
However, no RCTs have identified an optimal dosing or drug combination strategy. 2 Most traditional algorithms in medicine are sets of rules based on existing knowledge in a specific topic (in our case hypertension).
In contrast, machine learning algorithms are a relatively new area of research in computer sciences and statistics, aiming to identify novel and valid patterns in data. Machine learning encompasses different modeling tools, which utilize computers to uncover "hidden insights" through learning from trends in large sets of data. 3 The objective of this study was to use this new approach to identify effective treatment choices for hypertension based on big data analysis of a large cohort of hypertensive patients. In parallel, it was aimed to identify concomitant drugs not taken for hypertension which may contribute to success in lowering BP.

| MATERIALS AND METHODS
From the electronic medical charts of Maccabi Health Services, the second largest health service organization in Israel insuring more than 2 million members, 4 we identified patients receiving their first ever drug treatment for hypertension after a diagnosis had been made.
Medications utilized were identified from the electronically recorded purchases by the patient. For these patients, initial systolic and diastolic values were calculated using the mean of at least measurement over the 200 days before treatment. Patients lacking at least 2 measurements in this time frame were excluded. Weight, age, BMI, and smoking status were extracted from the electronic medical charts, calculating their mean, median, maximum, minimum, and standard deviation. The success criterion was defined as achieving BP lower than 140/90 in at least 1 measurement within 90 days of treatment initiation. Measurements were performed by primary care practitioners. No standardization of measurements among physicians was aimed.

| Machine learning methodology
"Classification" is a task in machine learning in which the data can be divided into separate categories or classes. The algorithm is attempting to predict the correct class for each data item in the repository. In our case, there were 2 classes: "treatment success" (ie, achieving BP lower than 140/90 within 90 days of treatment initiation), and "treatment failure" (any other case).
We used 2 types of machine learning algorithms for this task: Decision trees 5,6 and fully connected Neural networks. 7,8 The analysis was done using Python, and statistical and machine learning functions and infrastructure from the following packages and libraries: Scipy, Sklearn, and Keras.
In the training phase of the machine learning model, the computer is presented with a training data set. For each example in this dataset the correct classification is also given. The model strives to set its internal variables in a way that minimizes the difference between its prediction and the correct classification for each example in the test set. In the case of Decision Tree algorithms, the internal variables of the model represent a tree structure in which a decision is made in each branch according to the data features. where each cell preforms a simple mathematic operation, the weights given to each cell's output are adjusted in order to achieve the best prediction.
The performance of the model was tested using cross validation aiming to reduce overfitting, whereby 90% of the derivation data are used as a learning subset to construct a model, and examine its performance on the remaining 10%. This process was repeated 10 times by dividing the derivation set into new and different learning and testing subsets.
We calculated for each drug or their combinations the success rate and the area under the receiver-operator curve (AUC), where the x axis marks the false positive rate (1 À specificity) and the y axis shows true positive rates (sensitivity). The "positive" set contained patients who received the drug treatment and met the criterion of "BP lower 140/90 within 90 days of treatment initiation"; and the "negative" set contained patients who received the drug treatment while NOT meeting the criterion. The closer the AUC is to 1.0, the better is the overall performance of the mode. 9 In an attempt to eliminate as much patient variability among drug choices, we used propensity score matching to examine whether a specific drug treatment/combination achieved independently higher success rates. 10 We used the following patient characteristics for the matching: hypertension drug treatment, initial BP, weight, age, BMI, and smoking status. Re-sampling was allowed in the matching process (ie, the same patient could be matched to several patients from the original group). The basic idea behind matching is to try and match 1 group of observations with another group of observations in such a way that the items in the groups are as similar as possible in all aspects except for the tested variable. In our case, given a group of patients that are treated with drug x, we aimed to match every patient with a patient that is identical to him/her in age, weight, BMI, etc., except for the fact that the matched patient was not treated with drug x.

| Effects of concomitant drugs on hypertension
In addition to antihypertensive medications, our dataset contained records of all other purchases of prescribed pharmaceuticals given by the health care providers to hypertensive patients.
Patients from the untreated group were matched to patients from the treated group based on the propensity score.
We performed an exhaustive search over all treatment groups, excluding those that were bought by <200 patients, identifying 73 such groups. For each treatment group, we compared hypertension treatment success rates of the group of patients treated with that specific treatment and a matched group of patients that were not treated with that specific treatment. Based on the entire data base, logistic regression was used for predicting the probability of treatment success with the matched drug and this constituted the propensity score. For each patient in the treated group we matched a patient untreated with that specific treatment with the closest propensity score.
Pearson's chi-squared test was used to determine whether the success rates differed among groups. To accommodate for multi hypothesis testing, the P-values were corrected according to the Bonferoni correction. We present the 5 smallest chi square P-values including the corrected P-value (ie, original P-value times number of hypotheses tested).
We used the following patient characteristics for the matching: hypertension drug treatment, initial BP, weight, age, BMI, and smoking status. Treatment groups were excluded according to rate of re-sampling and Kelmogorov-Smirnof (KS) goodness of fit tests for all features. 11 We chose a re-sampling rate of 20%, with P < .0001 for a single feature as our limit for group's exclusion. That is, if the KS test for 1 of the features we matched had P > .001, we considered it to be an ill fit and discarded the treatment group.
However, given that other matching parameters were correct, such an ill-fitted feature may be interpreted as another factor (in addition to the treatment group) for hypertension treatment success.

| RESULTS
Based on our exclusion criteria, the resulting dataset contained 30 705 patients, whose characteristics are presented in Table 1.
Most patients (17 234) were initially treated with 1 class of hypertensive drugs, 9176 were initially treated using 2 drug types, 3425 and 867 patient were treated with 3 and 4 drugs, respectively (Table 2). ACE inhibitors and ARB were the most common treatment, used by 73% of the patients. Beta blockers were prescribed to 47% of the patients making them the second most common treatment. These rates held either in overall prescriptions as well as when analyzing drug combinations (Table 3).
Beta blockers had the highest success rate among the different drug groups either by themselves, in 2 drug combinations (Table 4), or in 3 drug combinations (Table 5).
We used 3 variations in decision tree classifiers for predicting treatment success: Decision tree, random forest, and xgboost. In all cases the maximal tree depth was set to 5 with a minimum of 100 samples per leaf. These classifiers achieved an average AUC of 0.7 (Table 6). The important predictors in all variations were as follows: initial systolic value, the difference between initial systolic and diastolic value and the initial diastolic value. The lower these observed values were, the greater the likelihood that treatment would prove successful. Additional important predicting features were as follows: weight, age, and BMI but these were less prominent.

| Effects of concomitant drugs on hypertension treatment success
As seen in Table 8 The model created through these steps could then be applied on a new and previously unused data. 3,9 In our case, machine learning algorithms allowed us to identify novel and valid patterns in hypertension treatment data which can- a Classification task is "success" or "failure" in controlling blood pressure as defined in the methodology.
T A B L E 8 Success in causing an anti hypertensive effect by concomitant drugs not aimed for hypertension. Proton pump inhibitors and statins achieved the highest significance levels

Treatment group
Chi-squared P-value

P-value
Proton pump inhibitors <.3 9 10 À6 <.3 9 10 À6 HMG CO-A reductase inhibitors <.3 9 10 À7 <7.2 9 10 À5 Platelet aggregation inhibit <1.6 9 10 À3 <7 9 10 À1 Antimycotic + steroid <1.7 9 10 À2 <.24 Corticosteroids, inhaler <2.7 9 10 À2 <.2 To account for multiple comparisons, a significant anti hypertensive effect was set on corrected P values of P < .001. The common denominator of all these studies is that they compared beta blockers to other single drugs. In contrast, our study examined the effectiveness of combinations of antihypertensive drugs in successful treatment of hypertension. It shows that when a beta blocker is given either alone, or in combination, it exhibits significantly higher success rates as compared to all other classes of drugs (Table 3). This held true when a beta blocker was given in 2 drug (Table 4), or 3 drug combinations ( In a recent meta-analysis, there was a significant reduction in plasma asymetric dimethylarginine (ADMA) by statins. Endothelial dysfunction may be associated with increased circulating ADMA, 14 and this may be a potential mechanism for this BP lowering effect of statins.
In 2014 Joya-Vazquez and colleagues analyzed records of hypertensive patients according to regular use of PPI. 15  In summary, these previously unrecognized effects of PPIs and statins have been very recently identified in preliminary clinical observations, lending credibility to our novel data science methodology. This experience suggests that data science methodology using machine learning may be an effective means for repurposing medications already on the market, for new indications.

DISCLOSURES
None declared.