Volume 49, Issue 6 p. 591-596
Free Access

Validation of the diagnosis of venous thromboembolism in general practice database studies

R. Lawrenson

R. Lawrenson

European Institute of Health and Medical Sciences, University of Surrey, Guildford, Surrey and

Search for more papers by this author
J.-C. Todd

J.-C. Todd

European Institute of Health and Medical Sciences, University of Surrey, Guildford, Surrey and

Search for more papers by this author
G. M. Leydon

G. M. Leydon

Department of Epidemiology and Population Health, Cancer and Public Health Unit, London School of Tropical Medicine, London

Search for more papers by this author
T. J. Williams

T. J. Williams

European Institute of Health and Medical Sciences, University of Surrey, Guildford, Surrey and

Search for more papers by this author
R. D. T. Farmer

R. D. T. Farmer

European Institute of Health and Medical Sciences, University of Surrey, Guildford, Surrey and

Search for more papers by this author
First published: 24 December 2001
Citations: 121
Dr R. Lawrenson, European Institute of Health and Medical Sciences, Stirling House, Stirling Road, Surrey Research Park, Guildford, Surrey, GU2 5RF. Tel.: 01483 302239; Fax: 01483 300359


Aims The study was conducted to determine whether the method for selecting cases of venous thromboembolism (VTE) from general practice databases significantly affected the findings of an epidemiological study.

Methods Cases of VTE were identified from the UK General Practice Research Database (GPRD) by searching for codes for deep vein thrombosis (DVT) and pulmonary embolism (PE). These had to be supported by evidence of anticoagulation and be exposed to a combined oral contraceptive (COC) at the time of the event. Additional information about the event was sought from general practitioners who were requested to complete a questionnaire and to supply anonymised copies of hospital letters and discharge summaries.

Results Of the 285 cases identified from the GPRD, additional information was available for 177 VTE events. This information showed that 84% of those events were supported by hospital investigations or a death certificate. Using only verified cases, rather than all GPRD identified events, did not alter the results of the epidemiological study.

Conclusions The GPRD provides information of sufficiently high quality to allow valid epidemiological research of VTE events. Excluding cases without a database record of hospital admission would lead to valid events being overlooked, and an under-estimate of the disease incidence.


There have been a number of studies conducted over the last 3 years using general practice databases to look at the association between venous thromboembolism (VTE) and the use of combined oral contraceptives. In response to the results of one of these studies [1] (as well as two hospital based case-control studies) [2, 3]) the Committee on Safety of Medicines issued a warning to all UK general practitioners about the safety of combined oral contraceptives containing desogestrel, and gestodene [4]. This warning led to a reduction in the use of all combined oral contraceptives, an increase in teenage pregnancies and an 8.6% increase in legal abortions [5].

In 1997 we published a study based on the MediPlus database that showed no difference in the risk associated with combined oral contraceptives containing either levonorgestrel, desogestrel or gestodene [6]. It has been pointed out that in this study the diagnosis of VTE in the selected cases was not validated with hospital discharge records or notes held by the general practitioners [7–9]. Furthermore the study included cases with a diagnosis of venous thromboembolism and evidence of anticoagulant treatment, but did not require a computer record of hospital admission.

It is a concern of any general practice database study that selection bias is avoided and all valid cases are included in the study. Unavoidably some cases will go undetected, as some patients with VTE may never present to their general practitioner. Other events may be misdiagnosed and therefore hospital referral and treatment not occur. Using certain treatments as a basis for case selection may also disqualify potential cases – for example, some patients may die from pulmonary embolism before it is possible to administer anticoagulants. As VTE is a potentially fatal disease, we believe that cases that have died as a result should always be included. We also argue that a database code for hospital admission is not necessary, because general practitioners in the UK would rarely continue to treat VTE with anticoagulants without reference to a specialist.

The original paper by Jick et al.[1] used a subset of the practices that contribute to the General Practice Research Database (GPRD). The study selected patients who had a diagnosis of VTE, a prescription for an anticoagulant, a record of hospital admission and a record of a combined oral contraceptive containing either levonorgestrel,desogestrel or gestodene. Attempts were then made to validate all the identified cases. Those deep vein thromboses confirmed by venograms or Doppler ultrasound tests and those pulmonary emboli supported by VQ scans or angiography, were accepted as probable cases. In addition to the 51 confirmed cases, and included in the final analysis, were 29 cases designated as possible cases; 13 patients who were treated clinically or on the basis of equivocal results and 16 that met the selection criteria but for whom there were no further records available. All other cases, including fatalities, were excluded.

In another study using GPRD [10] we compared the risk of VTE between the most commonly used combined oral contraceptives. Fatalities were included and specifically sought. We attempted to verify all fatal and nonfatal cases, with information from the general practitioner or death certificates.

The aim of this study was to ascertain whether the way that VTE cases were selected from the database significantly affected the findings of the oral contraceptive study.


This investigation used the UK General Practice Research Database (GPRD). The GPRD has been described elsewhere [1, 11, 12] and has been used extensively in epidemiological research generally and specifically for studies concerned with the assessment of the safety of medicines.

We identified as potential cases all women who had a diagnosis of first event of deep venous thrombosis or pulmonary embolism, had evidence of treatment with an anticoagulant and had a record of a prescription for a combined oral contraceptive (COC). The full computer record of each of these women was printed and assessed blind to the type of COC exposure. Women were accepted as cases of VTE if they had one of the above diagnoses followed by evidence of anticoagulation, and were exposed to a COC on the day of the event. The database was also searched for all women who died from pulmonary embolism or deep venous thrombosis, and all women who had died and for whom there was no recorded cause of death. In order to restrict the study to idiopathic VTE, women were excluded if:

  • they were pregnant at the time (signified as a record of a delivery within 38 weeks of the event);

  • within 42 days of the event they:
    had delivered a baby;
    had a termination of pregnancy;
    had surgery requiring an general anaesthetic;
    had major trauma to the lower limbs;
    had evidence of malignant disease;
    were using other sex hormones concurrently with the oral contraceptive;
    had significant congenital heart disease; or

  • the event was associated with a drug overdose.

Cases were also excluded if there was less than 6 months of research standard data available in the record prior to the event. (The management committee of the GPRD determines when the data supplied by each practice is of research standard based on the completeness and accuracy of records.) The same inclusion and exclusion criteria were applied to fatal cases as for nonfatal, except that a record of anticoagulation was not a requirement.

Where possible, we attempted to validate the VTE event of each selected case. For all cases where the computer record indicated the patient was still registered with the practice at the time the research was carried out, letters were sent to the doctor asking for additional information to confirm the diagnosis. The relevant general practitioners from the practices were asked to complete a questionnaire and to send anonymised copies of the hospital letters and discharge summaries relating to the event. To maintain the anonymity of both patients and general practitioners, this additional information was sought through the General Practice Database Research Company. No further enquiries were made for patients who had transferred to other practices, as it was assumed that all written records would have been sent on to the new practice and would therefore not be available. Similarly, we did not approach practices that had switched from the VAMP Medical System to another Software Company or had stopped supplying data for other reasons. The questionnaire asked whether the patient had been admitted to hospital, whether they had been treated with an anticoagulant and whether there was a prior history of venous thromboembolism. On the basis of this extra information, each case was assigned to one of three validation-categories:

(1) definite cases where the diagnosis was confirmed by hospital tests or a death certificate;

(2) equivocal cases where tests were either negative or equivocal, yet treatment was instituted;

(3) unknown cases where either no information could be sought, or was not sent or available.

For all VTE fatalities an anonymised copy of the death certificate was sought.

We then undertook a nested case-control study. A group of controls was randomly selected each comprising up to four controls per case. The group was matched to the case by practice and exact year of birth. All controls were exposed to a COC on the event date of the case.

The primary analyses were conditional logistic regressions using STATA, focusing on formulation rather than progestogen. The five most frequently used formulations were included. These were levonorgestrel 150 μg + ethinyloestradiol 30 μg, desogestrel 150 μg + ethinyloestradiol 30 μg, gestodene 75 μg + ethinyloestradiol 30 μg, desogestrel 150 μg + ethinyloestradiol 20 μg and triphasic levonorgestrel (50 75 125) μg + ethinyloestradiol (30 40 30) μg. All other COCs were combined together as ‘Others’ in a sixth category. Additional variables included in the model were BMI, smoking, asthma and the number of non-OC prescriptions issued in the 6 months preceding the event date. This last variable was used as a proxy general health measure.

Odds ratios (OR) with 95% confidence intervals were calculated for all cases. In order to evaluate the effect of differing selection criteria amongst database studies, we also conducted four further analyses. The first used only those cases and their matched controls in whom the diagnosis was verified as definite, the second used those cases for whom the diagnosis was categorized as equivocal, the third used only the unverified cases and the fourth used definite and equivocal cases together.


Data were available from 618 practices who had been providing data of acceptable standard to the GPRD at some time between 1989 and 1997. The maximum size of the database was between 1991 and 92 when there were about 1.1 million women born between 1945 and 1982 registered with the participating practices. During the period covered by the study there were a total of 6.4 million women years of observation, about 1 million of which were exposed to oral contraceptives. Initially 296 cases were identified from the database. However in matching by practice and year of birth, 11 cases were orphaned, i.e. there were no available controls. These women were mostly over 40 years of age. This left 277 nonfatal cases of VTE and eight deaths. To these we matched 1098 controls. Table 1 sets out the principle information that was abstracted from each case and control, and the percentages of cases and controls in each category and exposed to each COC formulation. As can be seen from Table 1, the differences between the verified, equivocal or unverified case groups were small. Only 5/48 variables tested were statistically significant. In the case of desogestrel and 20 µg EE the majority of identified cases were verified. Another variable that showed a statistical difference was current smoking where the group of 28 cases who had negative tests but were treated on clinical grounds were more likely to be current smokers.

Table 1. Comparison of control characteristics with cases (verified, equivocal and unverified).
COCs/characteristics % controls
n = 1086
Verified (n = 149) % Cases (n = 285)
Equivocal (n = 28)
Unverified (n = 108)
Levonorgestrel 150 μg + EE 30 μg 20 21 21 23
Desogestrel 150 μg + EE 30 μg 20 20 21 25
Gestodene 75 μg + EE 30 μg 19 22 18 21
Desogestrel 150 μg + EE 20 μg 7 9 4 2 *
Triphasic levonorgestrel 12 7 11 11
All other combined oral contraceptives 22 22 25 18
BMI < 25 63 43 57 52
BMI 25–30 15 24 18 13 *
BMI 30 + 8 17 11 15
BMI unknown 15 17 14 20
No asthma 90 79 86 83
Asthma 10 22 14 17
Non-smokers 65 54 39 46
Smokers 28 34 57 * 32
Smoking status unknown 7.8 12 4 21 *
No non-OC scripts 36 21 11 24
1–2 non-OC scripts 20 15 36 * 13
3 + non-OC scripts 44 64 54 63
Age 15–24 years 28 26 14 32
Age 25–34 years 56 52 68 55
Age 35–49 years 16 22 18 13
Year of event 1992/1993 36 38 46 29
Year of event 1994/1995 47 48 43 47
Year of event 1996/1997 17 15 11 24
  • * P < 0.05 when comparing characteristics of verified cases with equivocal or unverified cases.

Of the 277 nonfatal cases we sought further information on 186 which we believed were still registered with a currently contributing practice. We received replies for 183 cases from the general practitioners. In 14 cases however, no further information was available because the patient had since left the practice. Therefore of 172 cases for whom information was potentially available, we received details and hospital letters from 169 (98%).

In all but two cases (99%) the GP confirmed that the patient had been admitted to hospital. In one case the patient had a venogram as an outpatient and was treated without being admitted, and the other the patient was treated at a hospital accident and emergency department on the basis of clinical signs and a positive Doppler ultrasound. This contrasts with the computer record where only 81% have a record that they were admitted as an in-patient.

In the 169/277 (61%) cases for which further information was available, the diagnosis was accompanied by either venogram, Doppler ultrasound or ventilation perfusion (VQ) scan. In 141 of these the tests were fully supportive and these cases were categorized as ‘definite’. In 28/169 (17%) cases the diagnosis was equivocal and the patient was treated on clinical grounds alone.

All eight fatalities from pulmonary embolism were supported by a death certificate, and in most cases by additional information from a postmortem, and were accordingly categorized as definite cases. Fifty-two per cent (149/285) of all cases were confirmed as definite, i.e. confirmed by tests or death certificates. For 108 cases no further information was sent or available ( Table 1). Of those cases for which further information was available, 83% (149/169) of the anticoagulation-supported diagnoses of VTE were supported by hospital investigations.

In 11 cases the hospital records provided additional information that, had it been available on the computer system, would have lead to the cases not being considered idiopathic. These were due to information regarding trauma or immobilization before the VTE event and within the 42 day exclusion period.

Four conditional logistic regressions were carried out. The results of these are shown in Table 2. No significant differences for formulation were found when all cases were used, or when only verified cases were used in the analysis. Using verified cases did cause an increase in the OR of desogestrel + 20 μg EE from 0.8 to 1.3. Only very small differences were noted in the other formulations. Using only verified cases for the analysis still showed that a raised BMI, current smoking, asthma and the proxy variable non-OC scripts were all significantly associated with VTE.

Table 2. Conditional logistic regressions comparing ORs by COC formulation of verified, equivocal and unverified cases with the ORs of all cases.
Validation groups OR (95% CI)
Formulation All cases
OR (95% CI)
Adjusted Verified cases
Group 1
Equivocal cases
Group 2

Unverified cases
Group 3
Verified and
equivocal cases
Groups 1 + 2
Levonorgestrel 150 μg + REFERENCE REFERENCE
ethinyloestradiol 30 μg
Desogestrel 150 μg + 1.1 (0.7,1.7) 1.0 (0.7,1.7) 1.0 (0.5,2.0) 1.2 (0.2,6.7) 1.1 (0.5,2.3) 1.0 (0.6,2.5)
ethinyloestradiol 30 μg
Gestodene 75 μg + 1.1 (0.7,1.7) 1.3 (0.8,2.0) 1.2 (0.6,2.4) 1.6 (0.3,8.3) 1.5 (0.7,3.1) 1.3 (0.7,2.4)
ethinyloestradiol 30 μg
Desogestrel 150 μg + 0.8 (0.4,1.5) 0.8 (0.4,1.5) 1.3 (0.6,3.1) n/a 0.3 (0.1,1.4) 1.1 (0.5,2.5)
ethinyloestradiol 20 μg
Triphasic levonorgestrel 0.7 (0.4,1.1) 0.7 (0.4,1.2) 0.6 (0.2,1.3) 0.8 (0.1,4.7) 0.9 (0.4,2.0) 0.6 (0.3,1.2)
All other combined oral 0.9 (0.6,1.3) 1.0 (0.6,1.5) 1.1 (0.6,2.0) 1.0 (0.2,4.4) 0.7 (0.4,1.5) 1.2 (0.7,2.1)
BMI 25–30 1.7 (1.2,2.4) 1.6 (1.1,2.4) 2.4 (1.3,3.8) 0.9 (0.2,4.0) 1.0 (0.5,2.1) 2.0 (1.2,3.2)
BMI 30 + 2.8 (1.8,4.2) 2.4 (1.5,3.8) 2.5 (1.3,4.6) 1.7 (0.2,11.5) 2.9 (1.3,6.4) 2.3 (1.3,4.1)
Unknown BMI 1.7 (1.1,2.6) 1.1 (0.7,1.8) 1.1 (0.6,2.2) 1.7 (0.3,0.2) 1.3 (0.6,2.8) 1.1 (0.6,2.0)
Asthma 2.2 (1.5,3.2) 1.9 (1.3,2.9) 2.1 (1.2,3.7) 2.7 (0.6,13.0) 1.4 (0.7,2.8) 2.1 (1.3,3.5)
Smokers 1.8 (1.4,2.5) 2.0 (1.4,2.7) 1.8 (1.1,2.8) 4.2 (1.4,12.5) 1.8 (1.1,3.1) 2.1 (1.4,3.2)
Smoking unknown 3.1 (1.9,5.0) 2.9 (1.6,5.1) 2.4 (1.1,5.2) n/a 2.9 (1.3,6.7) 2.7 (1.2,5.9)
1–2 non-OC scripts 1.5 (1.0,2.3) 1.6 (1.0,2.4) 1.7 (0.9,3.1) 4.1 (0.8,21.2) 1.1 (0.5,2.3) 2.0 (1.1,3.4)
3 + non-OC scripts 2.5 (1.8,3.5) 2.2 (1.6,3.1) 2.1 (1.3,3.5) 3.3 (0.7,14.9) 2.4 (1.3,4.1) 2.1 (1.3,3.4)


The principal finding from this study is an anticoagulation-supported diagnosis of VTE on the GPRD is supported by hospital investigations in 83% of cases, and that such diagnoses are made on clinical grounds in 17%. This would suggest (assuming no differential recording of cases for whom additional information was not available) that the GPRD is of sufficiently high quality to allow its use for epidemiological research of this nature. The characteristics (BMI, asthma, smoking categories, etc.) of those patients treated on clinical grounds (equivocal cases) or those for whom additional information was not available (unverified cases) resembled more closely the verified cases than the controls. This would support the notion that the majority of these cases are true cases and can legitimately be included in the analysis. In further support of this, excluding the equivocal or unverified cases from the analysis and using only verified cases has almost no effect on the adjusted odds, with the exception of desogestrel 150 μg + EE 20 μg, OR 1.3 verified cases, 0.8 all cases (the OR of 0.3 for the unverified group was based on only two cases).

Only 81% of cases had a record on the database indicating hospital admission. However it seems from data from the verified cases that the presence of a prescription for an anticoagulant was a much more sensitive and specific marker of hospitalization. This is because general practitioners are not the sole gatekeeper to the secondary care system. Many patients will be admitted directly to hospital by after hours services or will be admitted through the ambulance service or following their presentation at Accident and Emergency. The doctors providing data to the GPRD meet the highest standards with respect to the completeness and accuracy of their prescribing and diagnostic data, but administrative data such as the record of the methods of admission to hospital is not one of the required standards and does not reduce the validity of the database. We believe that restricting the search to cases of VTE with a record of hospital admission leads to an under recording of cases and a subsequent under estimate of the true incidence of VTE in young women taking combined oral contraceptives. When compared with cases with a record of hospital admission on the database, those cases without such a record showed no real difference in terms of COC formulation, BMI, asthma or smoking status.

In all cases, apart from the fatalities, the general practitioner records confirmed the prescription of anticoagulants. This included six cases where the evidence of anticoagulation from the database did not include a prescription but supportive evidence through records of INR testing or attendance at anticoagulation clinics. Again this shows that for completeness, evidence of anticoagulation should not be restricted to simply a prescription for an anticoagulant, but that a wider search strategy should be used.

In summary we have shown that the GPRD provides diagnostic and prescribing data of the highest quality. We believe that a validation exercise such as the one carried out in this study is necessary when the accuracy and completeness of the data are in doubt. If there is misclassification, this is likely to bias the findings towards the null hypothesis [13]. However, once validation has been carried out further case-control studies of VTE should be possible without necessarily requiring cases to be validated from hospital records.


This study was funded by an unconditional grant from NV Organon and Schering AG.