Methods for the assessment of selection bias in drug safety during pregnancy studies using electronic medical data

Abstract Electronic health data are routinely used for population drug studies. Due to the ethical dilemma in carrying out experimental drug studies on pregnant women, the effects of medication usage during pregnancy on fetal and maternal outcomes are largely evaluated using this data collection medium. One major limitation in this type of study is the delayed inclusion of pregnancies in the cohort. For example, in the province of Quebec, Canada, a major pregnancy cohort only captured pregnancies after 20 weeks gestation. The purpose of this study was to demonstrate three methods that can be used to assess the extent of selection bias due to the delayed inclusion of pregnancies. We use causal directed acyclic graphs to explain the source of this selection bias. In an example involving a cohort of pregnant asthmatic women reconstructed from the linkage of administrative health databases from the province of Quebec, we use numerical derivations, a simulation study and a sensitivity analysis to investigate the potential for bias and loss of power due to the delayed inclusion. We find that this selection bias can be partially mitigated by controlling for variables related to (spontaneous or therapeutic) abortion and the outcome of interest. The three proposed methods allow for the pre and post hoc ascertainment of the bias. While delayed pregnancy inclusion selection bias (which includes “live birth bias”) can produce substantial bias in pregnancy drug studies, all three methods are effective at producing estimates of the size of the bias.

Electronic health data, extracted from health system or medical insurance administrative (claims) databases, are highly desirable due to their availability and coverage of large portions of the target population within countries' administrative divisions. 5 Such data are often used to investigate drug safety during pregnancy and there has been much discussion of methodological considerations in this setting. [6][7][8] In particular, as these data were not collected for research purposes, health conditions, and medications may be only partially observed. In particular, pregnancy status may only be recorded after crossing a gestational time threshold. [9][10][11][12] For example, depending on the data source, pregnancy cohorts may be composed exclusively of live births [13][14][15] or pregnancies that survive past a certain threshold.
We refer to such situations as the delayed inclusion of pregnancies in the cohort.
We describe an example from the province of Quebec, Canada that aims to evaluate the safety of similarly indicated asthma medications on pregnancy outcomes. The usage of electronic health data in this example relies on an extraction of provincial medical insurance data where the cohort is defined in terms of "deliveries," that is, all pregnancies that surpass a 20 week threshold. 16 We use directed acyclic graphs (DAGs) 17 to demonstrate that the delayed inclusion of pregnancies can lead to bias in the estimation of drug safety or effectiveness. Such bias is often termed "selection" or "collider" bias and can often not be removed using the restricted cohort and limited measured information. [18][19][20] However, one can evaluate the possible extent of the bias by imputing plausible values for several inestimable associations. 20 We therefore describe three strategies to evaluate the potential impact of the selection bias in this setting: (1) numerical derivations, which plot the resulting level of bias conditional on a plausible range of associations, (2) simulation studies, which require fixing single values for the unobservable associations, and (3) post hoc sensitivity analysis to determine whether selection bias, under plausible assumptions, could have affected the statistical conclusions of the study. Finally, the code to implement the numerical derivation and simulation study are available in the Web Appendix.

MED ICATION DURING PREGNANCY
While many of the principles evaluated in this article apply in a large range of settings, we focus on the safety of asthma medication taken by pregnant asthmatic women in a cohort of pregnancies. Current guidelines suggest that pregnant women suffering from asthma continue their standard treatment throughout pregnancy due to the dangers of uncontrolled asthma provoked by stopping therapy. 21 However, interest lies in the relative effects of different treatment options and intensities on various outcomes related to the fetus and maternal health.
The Québec Asthma and Pregnancy Database 22 was obtained through a linkage of the Régie de l'assurance maladie du Québec (RAMQ) and the MED-ECHO databases. RAMQ, the universal health care system in the province of Québec, Canada, defines delivery as all live or stillbirths occurring after the first completed 20 weeks of pregnancy. The data extraction took all deliveries between the years 1990 and 2010 for women ≤ 45 years with at least one asthma diagnosis in the 2 years prior to delivery and a random sample of other pregnant women. For inclusion, these women also had to be covered by the Québec public drug insurance plan in the year prior to and during pregnancy. Eltonsy et al 22 contrasted treatment options for different asthma severity levels on major congenital malformations recorded at birth or during the first year of life. The outcome was identified using codes from the International Classification of Diseases ninth and tenth revisions with more details provided in the original manuscript. In women with moderate asthma, they compared two alternative treatments over the span of the first trimester: (1) a low dose of inhaled corticosteroids plus the add-on therapy of long-acting β 2 -agonists vs (2) a higher dose of inhaled corticosteroids and no add-on therapy.
The exposure was measured over the first trimester (ie, the first 12 weeks of gestation) due to their hypothesis that this corresponds with a teratogenic window, as discussed in the original manuscript.
The investigations in this manuscript did not involve individual patient data and the study is therefore exempt from institutional ethics review.

| A DIRECTED ACYCLIC GRAPH
The identification and selection of subjects into an analysis can produce bias in the effect estimation due to selection on a collider variable. 19 In Figure 1 we present two examples of selection due to the definition of "delivery" in the RAMQ administrative health database; pregnancies are only classified as deliveries (and available in our cohort) if they surpass the 20 week mark. It is estimated that 13%-15% of pregnancies end in spontaneous abortion 23 and 21% in induced abortion (in Canada and the United States) 24 and it is thought that these numbers may be underestimated, 25 so this selection is not trivial.
The arrows (or directed edges) between variables indicate that one variable (the parent) affects another (the child) in the direction of (A)

(B)
F I G U R E 1 Collider Bias in Delivery Cohorts. D represents delivery, defined as birth after 20 weeks. A 1 is the exposure to the medication before 20 weeks and A 2 is exposure after 20 weeks the arrow. The dotted lines indicate correlations between two variables due to latent and temporally prior variables. A path between two variables is an unbroken route that proceeds along or against the direction of the arrows. A path is considered open unless (1) conditioning on a variable blocks the path (denoted by a square around the variable) or (2) the path goes through an unadjusted collider: a variable that is affected by two parent variables. Adjusting for a collider opens the previously blocked path. If there is an open path between the outcome and exposure other than the path of interest, estimation of the causal effect will be biased. 26 In the DAG in Figure 1, we consider a scenario where an investigator is interested in estimating the effect of exposure to a medication during early pregnancy on a birth outcome. In our example, this corresponds to early usage of asthma controller medications in the first tri- If U cannot be controlled for in the analysis, one option is to change the question of interest to investigate the effect of exposure to medication past 20 weeks (A 2 ) on the birth outcome, while adjusting for A 1 in order to close all backdoor paths. However, this is hardly satisfactory when early exposure is believed to be responsible for a specific birth outcome, such as for congenital malformations. 22 In addition, the estimation of the effect of A 2 would require that many women changed their treatment categories over the two time points (otherwise the effect of later exposure would be entirely confounded with that of A 1 ). Other options are to explore the extent of the bias using numerical derivations, simulation studies, and sensitivity analyses as we do in the next section.
A structurally equivalent type of bias has been shown to arise in cohorts defined by an index event such as disease occurrence when assessing the association between exposure to medication and an outcome such as mortality. 28,29 This index event bias has been known to at times invert the association between the exposure of interest and the outcome; for example, smoking has been shown to be protective of subsequent myocardial infarction in cohorts of patients who had a first myocardial infarction, while a harmful effect of smoking is biologically plausible. 30 In another example, index event bias was shown to explain the apparent protective effect of obesity on mortality in patients with cardiovascular disease. 31

SELECTION ON BIAS AND STATISTICAL POWER
While there are many sources of bias in epidemiologic studies, the particularities of the data determine whether the bias has an important impact on the scientific conclusions. Consider the example concerning selection on births past 20 weeks. A simplified DAG is presented in Figure 1B where A 1 is an asthma medication in the first trimester of pregnancy and the outcome Y is major congenital malformations.
We also only consider a univariate U in the following development, though additional complexities may be added with modifications to the code. An example of such a variable (U) is antidepressant medication taken during pregnancy. This variable was not assessed and therefore not adjusted for in the Eltonsy study, making it a potential source of some selection bias. There is observational study evidence of impacts of antidepressant medications on spontaneous abortion with odds ratios between 1.1 and 1.7 32,33 and on major malformations with odds ratios between 1 and 3 [34][35][36] though meta-analysis concluded that associations only appear to be present for cardiac malformations. 35 Using these estimates to inform our sensitivity analyses, we are assuming that these estimated effects on spontaneous abortions correspond to the effects on all abortion and that the effects in the general population correspond to the effects in asthmatic women. In order to express the uncertainty in these estimates, we investigate an extended range of possible effect sizes. We now describe three methods to evaluate the potential selection bias for various values of these associations.

| Numerical example
By specifying the distributions and relationships between the covariates in the DAG of Figure 1B, we can calculate the exact bias caused by the selection on deliveries. To this end, we assume that the variables are generated on a logit-linear scale with associations parameterized by conditional odds ratios. In particular, b A represents the effect size (conditional odds ratio) of exposure on outcome, while b U represents the effect of the unmeasured variable U on outcome. The parameters t A and t U represent the effects of A and U, respectively, on the probability of the pregnancy surviving the 20th week. We suppose that the baseline risks of abortion and gestational malformations are about 18% and 8%, 22 respectively, but that these risks can be exacerbated by a binary U. Letting the true effect size be b A ¼ 1 (no effect) ðconditional=true À 1Þ Â 100. Even for large associations t A and t U , the bias in our example remains fairly small, dropping to −10% only when being exposed and having U ¼ 1 both lead to a 4-fold increase in the odds of delivery over the baseline and when the true odds ratio for the effect of interest is 1.3. These results suggest that the biased analyses would be estimating odds ratios of 0.9 if there is no effect or roughly 1.2 if the true effect corresponds to an odds ratio of 1.3.
While the bias is proportionally small, with sufficient data this could result in different scientific conclusions. Attenuating the association b U results in a reduction in bias (results given in Appendix A2).
A strength of this approach is that one can visually investigate the trends in bias while modifying the values of two parameters at a time. It is also possible to increase the complexity of the assumed DAG, for instance, by considering the two distinct types of pregnancy loss or having multiple U variables (although this will also create new parameters to either assign values to or vary over a range of possible values). One weakness of this approach is the analyst must assign specifications for the distributions of Y and D and that the results may vary depending on this specification. In our example, we assumed that the probabilities of these binary variables are generated on the logit-linear scale, conditional on the prior variables.
In the Web Appendix A1, we provide the Mathematica (Wolfram Research, Inc, Champaign, Illinois, USA) code used to produce the graphics in Figure 2

| Simulation study
The 3D graphic allowed us to observe how the selection bias varies continuously with the importance of the unmeasured variable U.
Using the same data generating assumptions as in the numerical example and setting a range of values for the parameters t A , t U , b A , and b U , we can alternatively perform a Monte Carlo simulation study to estimate the expected bias and power to detect an effect of A on Y. We generated 1000 datasets each one representing N ¼ 10 000 pregnancies subject to selection on delivery with the same baseline odds of abortion and gestational malformations (18% and 8%, respectively). For each dataset, we calculated the odds ratio for the effect of interest using (1) only deliveries and (2) a random subset of pregnancies of the same number (which emulates a setting without selection due to abortion with the same sample size). We look at both bias and power to detect an effect over a wide range of associations in Table 1. We test three small effect sizes: odds ratios of 1.1, 1.2, and 1.3. Corresponding to Figure 2, the bias remains small except in the most extreme cases where the bias reached −7.7%.
However, the power to detect an effect can be drastically reduced by the selection on D ¼ 1 compared to random selection. We see the largest effects on power when the study is just barely well-powered or under-powered. For example, when t A ¼ 2 and t U ¼ b U ¼ 3 and the true effect was 1.2, the selection on a collider reduced the power from 68% to 42%. In the most extreme case with the smallest effect size, power was four times greater without collider bias.
While the bias in this example is relatively small, it is highly dependent on the baseline risks of outcome and selection. When increasing both baseline risks to 50%, the maximum bias increased to 25%.
One strength of this approach is that, unlike for the numerical example, one can vary multiple parameters in the same table; we varied four parameters in Table 1. As in the numerical example, one can also increase the complexity of the DAG. A particular advantage of the simulation study is that one can investigate the estimation bias, standard error, and power of the statistical estimator, while the numerical example only compares the bias in the conditional odds ratio (ie, the bias in what one would estimate with infinite data). The requirement of making arbitrary distributional assumptions is also a limitation of this approach.
In the Web Appendix A3, we provide the R software (https:// www.r-project.org/) code used in this simulation study.

| Sensitivity analysis
Sensitivity analysis can be used to evaluate the potential impact of selection bias on the scientific conclusion of a given study. Banack and Kaufman 31 demonstrate how sensitivity analysis for mediation 37 can be used to evaluate the impact of index-event bias. Starting from estimates obtained in a real study, we evaluate the potential for bias due to selection on births past 20 weeks. In Eltonsy et al 22 exposure to long-acting β 2 -agonists and inhaled corticosteroids was F I G U R E 2 % True Bias in the Odds Ratio Caused by Selection on Deliveries in the Numerical Example. % bias = (conditional/ true − 1)*100 when the true exposure effect odds ratio is (A) 1 and (B) 1.3. Note the absence of bias when t U = 1 or t A = 1, that is, when D is not a collider assessed during the first trimester of pregnancy. In women with moderate asthma, the contrast of interest was the relative effect of low-dose inhaled corticosteroids plus long-acting β 2 -agonists vs medium-dose inhaled corticosteroids therapy on the risk of major congenital malformation. This analysis did not adjust for variables that may cause both pregnancy loss before 20 weeks and the outcome as this type of selection bias was not noted at the time.
Corresponding with parameter b A in Figure 1B, the true causal effect of medication on the outcome in deliveries is analogous to a controlled direct effect with a mediator D confounded by the unobserved U. If U is a single binary variable, such as antidepressant use, and assuming that all confounders of the exposure-outcome are measured and are not caused by nor cause U, sensitivity analysis can be performed using the bias formula presented in VanderWeele. 37 Suppose the parameter of interest is defined as the conditional risk ratio in mean outcomes amongst deliveries for exposed vs unexposed women. Letting C be all confounders of the association between A and Y, the correction factor for the estimated risk ratio can be given as follows: Given that such a high value for γ would indicate an unrealistically large increase in risk of induced and spontaneous abortions, it is implausible that selection bias has masked a difference in safety between the two asthma therapy options contrasted. However, due to the low power (wide confidence intervals) of the original results, this still does not provide substantial evidence that an effect does not exist.
Alternative bias formulas exist for settings in which U may also affect the exposure, though these are far less simple than the ones above. 37 An additional limitation of this approach is that only one binary variable U may be considered at a time. This is a limitation because while one variable (such as anti-depressant use) may not produce enough bias to create a misleading effect estimate, multiple variables (such as anti-depressant use, smoking, and socio-economic status) combined may have a greater impact.

| DISCUSSION
In this article, we demonstrated that selection bias in electronic medical data may arise when defining the cohort on a postexposure variable such as limiting inclusion to pregnancies that pass a certain time T A B L E 1 Percent bias and (in brackets) percent of significant (P < 0.05) associations in a simulation study with 1000 random generations of N ¼ 10 000 pregnancies We contrast signal detection with selection on pregnancies past 20 weeks (D = 1) vs random selection of the same number of subjects (Random). All parameters used in the data generation (b A , t A , t U , and b U ) are expressed as odds ratios.
threshold. This includes the bias that arises from selecting only on live births, since this outcome occurs after the exposure to medication during pregnancy. Recent work demonstrated the potential for bias in the estimation of exposure effects when selecting on live births in settings where exposure contributes to the abortion of fetuses. 14,15,40 Additional work demonstrated the potential for bias and accuracy loss in a very similar setting where left-truncation is differential by exposure group. 12 We extend this work by demonstrating strategies that allow the investigator to ascertain the potential impact of selection on the scientific conclusions. Indeed, we found that the magnitude of the bias depends on the particularities of the data, including the baseline risks of the outcome and selection, emphasizing the need for study-specific bias assessment.
In standard observational studies, investigators will typically only consider adjusting for suspected confounders between the exposure and outcome. Recognizing selection bias should motivate investigators to attempt to measure and adjust for a wider set of covariates, potentially mitigating this additional source of bias. If the additional variables are unavailable, we demonstrated how it is possible to investigate the potential for bias in a given situation. We also showed how one might alternatively assess the sensitivity of the estimated effect to different strengths of selection bias.
In addition to the single source of postexposure selection that we However, this topic is beyond the scope of the current article.
Not all sources of bias in epidemiological studies threaten the overall validity of the conclusions; it is important to investigate the potential size of bias in relation to effect estimates. A greater understanding of the mechanics of statistical association can also guide attempts to reduce estimation bias. These pursuits will lead to more reliable studies and more nuanced conclusions of causal effects.