ChatGPT in pharmacometrics? Potential opportunities and limitations
The authors confirm that the Principal Investigator for this paper is R.A.A. Mathôt and no interventions were performed with human subjects/patients and or substances administered.
Abstract
The potential of using ChatGPT in pharmacometrics was explored in this study, with a focus on developing a population pharmacokinetic (PK) model for standard half-life factor VIII. Our results demonstrated that ChatGPT can be utilized to accurately obtain typical PK parameters from literature, generate a population PK model in R and develop an interactive Shiny application to visualize the results. ChatGPT's language generation capabilities enabled the development of R codes with minimal programming knowledge and helped to identify as well fix errors in the code. While ChatGPT presents several advantages, such as its ability to streamline the development process, its use in pharmacometrics also has limitations and challenges, including the accuracy and reliability of AI-generated data, the lack of transparency and reproducibility regarding codes generated by ChatGPT. Overall, our study demonstrates the potential of using ChatGPT in pharmacometrics, but researchers must carefully evaluate its use for their specific needs.
1 INTRODUCTION
Over the years, the use of artificial intelligence (AI) in medical research has shown great promise in enhancing drug discovery, identifying new treatment targets and predicting disease outcomes.1 AI is an umbrella term encompassing several advanced technologies, such as machine learning, natural language processing and deep learning. These methods facilitate the extraction of patterns and insights from vast amounts of data. A recent exciting development in AI research has been the public release of ChatGPT,2 developed by OpenAI. The model architecture behind ChatGPT (GPT; Generative Pre-trained Transformer3) has shown to be very capable of achieving strong natural language understanding, while its accessible graphical user interface has resulted in widespread adoption.
Large language models (LLMs) such as ChatGPT are trained on an enormous corpus of text in order to generative responses to queries.4 By devoting considerable human time labelling the quality of generated responses and re-training the model to produce the best responses, ChatGPT has surprised many by producing fluent and accurate responses to human inquiries. Aside from the public interest in the use of ChatGPT, there have also been suggestions to using the model to assist students and researchers by editing text, answering questions, writing code and finding relevant literature given a query.5-8
There already exist several publications discussing the potential impact of LLMs on a wide range of different research fields.9-11 It however remains unknown if tools like ChatGPT can also support researchers from relatively small research fields, potentially underrepresented in the training data. In this work, we investigate if ChatGPT can be used to assist during the development of population pharmacokinetic (PK) models. As an use case, we use ChatGPT to generate R code for predicting in vivo drug concentrations of standard half-life factor VIII (FVIII) concentrates in patients with haemophilia A.12 Next, we query ChatGPT to generate an interactive R shiny application that can be used for the interpretation of the model and the selection of optimal doses to reach certain target FVIII levels. Based on this use case, we aim to show that researchers unfamiliar with programming in R can nonetheless produce usable code for data analysis and discuss its limitations.
2 METHODS
2.1 Data collection and model development
We used the official implementation of ChatGPT v3.5 (https://chat.openai.com; OpenAI; 2023 May 24 version) to send and receive answers to queries. We wanted to make a population PK model for standard half-life FVIII in R and visualize the results by using a Shiny application. First, a simple query (“in R, make a one-compartment pharmacokinetic model for FVIII”) was used and the output generated by ChatGPT was evaluated. Based on the generated R code, follow-up queries were used to extend the functionalities of the PK model.
2.2 Population PK modelling
- In R, can you make a one-compartment pharmacokinetic model for FVIII using the package ‘desolve’? In the R code, use a CL of 2.5 dL/h and a V of 40 dL.
- Can you use a dose of 1000 IU?
- Can you add allometric body weight scaling on the PK parameters?
- Can you also include inter-individual variability on the PK parameters?
2.3 Population PK simulations
- Can you simulate a population of 50 subjects?
- Can you display every subject into one plot?
- How can I reproduce this population for the simulation?
- In the plot, can you display a prediction interval of the FVIII levels for the simulated population? Add a shaded area for the prediction interval in the plot.
2.4 Shiny application
- Can you display the results in a shiny application?
- Define a slider input for dose (250–4000 IU, steps of 250 IU) and body weight (40–100 kg, steps of 1 kg).
- Also, add as a slider a target FVIII level (ranging from 30 to 100 IU/dL with steps of 10 IU/dL). The app should also print the probability of reaching the target FVIII level at time 0 in this whole population.
Answers to the above queries were regenerated multiple times to investigate the reproducibility of responses by ChatGPT.
3 RESULTS
3.1 Developing a population PK model using ChatGPT
First, we asked ChatGPT to make a one-compartment pharmacokinetic model for FVIII in R and regenerate the responses. Each time, we obtained different R codes with different R packages used to develop the PK model. Furthermore, we encountered several errors while executing some of the generated R codes. A short overview of the output is displayed in Figure 1.
Next, more detailed queries were used in a step-wise manner to generate a population PK model by ChatGPT. Some of the output is displayed in Figure 2. We then iteratively asked ChatGPT to add components to the model. ChatGPT understood how to normalize the PK parameters to body weight using allometric scaling and add inter-individual variability. Some of the code generated by ChatGPT resulted in errors. For example, after requesting ChatGPT to use a dose of 1000 IU, the initial concentration of FVIII was incorrect. We promptly asked ChatGPT to correct the error, in which ChatGPT successfully corrected the initial concentration. Other errors occurred; however, ChatGPT demonstrated its problem-solving capabilities by providing revised R codes and solutions to address these issues. Not only did ChatGPT often produce functional code, but it also provided explanation on each section of the code. Afterwards, we asked ChatGPT to develop a shiny application to visualize FVIII levels. The Shiny application allowed the adjustment of patient body weight (from 40 to 100 kg with steps of 1 kg) and within the desired dosing interval (between 250 and 4000 IU). A shaded area displayed the 95% prediction interval. ChatGPT was able to successfully generate a Shiny application that can simulate population FVIII levels over time, with realistic predictions13 (figure 3). The R code is displayed in Appendix S1.
3.2 What it cannot do
While ChatGPT was successful in generating the model and Shiny application, reruns of the prompts often resulted in different outcomes. In some of these, the R code resulted in an error. To assess the impact of different querying approaches, a comparison was made between simple query and step-wise queries. The results revealed that the reproducibility of outcomes was higher when employing step-wise queries. However, even with this approach, achieving exact replication of R code based on a given query was difficult. Nonetheless, more similar R code was produced through step-wise querying. Moreover, ChatGPT was successful in generating R code for single dose simulations but struggled to provide appropriate code for simulating multiple doses of FVIII. Therefore, caution must be exercised when using R codes generated by ChatGPT in pharmacometrics. We also asked ChatGPT to produce NONMEM code for the same model. Although the produced code did resemble a NONMEM control stream, the produced file contained multiple errors and redundancies and failed to run.
4 DISCUSSION
We show that ChatGPT has the ability to generate functional R code for predicting drug concentration using a population PK model as well to develop an interactive Shiny application to visualize model predictions.
ChatGPT generated a one-compartment population PK model in R and updated the code based on user specifications. By using ChatGPT to develop a Shiny application in R, users inexperienced with R shiny can easily produce web applications for interpreting their models. Both applications show how ChatGPT can be used without extensive coding or programming knowledge. This can significantly reduce development time and effort while potentially improving user experience of such applications. Another advantage of using ChatGPT for programming is its ability to assist developers in identifying and fixing errors in their code. ChatGPT can suggest possible solutions for errors and other coding mistakes, which helps inexperienced users to debug their code.14, 15 This feature can help streamline the development process and improve the overall quality of R code.
There are also some limitations and challenges to the use of ChatGPT for applications related to pharmacometrics. ChatGPT is a stochastic model, meaning that generating responses to the same query multiple times may yield different results. This variability can be influenced by factors such as model randomness, potential biases in the training data, but most importantly input phrasing. During our analysis, we observed that using a simple query yielded different outcomes. For instance, ChatGPT generated R code using packages such as mrgsolve or deSolve to generate a PK model. To address this, we incorporated step-wise queries with specific instructions to use the deSolve package. Additionally, we employed more detailed queries to develop the model and Shiny application, resulting in higher success rates and improved reproducibility. Therefore, it is important to employ careful phrasing of queries in order to produce the desired results. To enhance reproducibility, it is recommended to document the specific inputs used and the code generated by ChatGPT. This documentation can help ensure that the exact same conditions are used in subsequent runs or when sharing the code with others. By providing detailed information about the inputs and code, other researchers or users can attempt to reproduce the results. Moreover, the generated code by ChatGPT may contain errors and therefore the code may not provide the intended results. Often errors may resolved by regenerating the R code by ChatGPT or by copying the error from the R console as a follow-up query into ChatGPT. Code generated by ChatGPT should be thoroughly reviewed and validated to ensure its correctness and completeness. This involves testing the code, comparing the results with expected outcomes and verifying that it aligns with established pharmacometric principles and practices. Unfortunately, this might be difficult for the potential target audience of those learning to code.
Next, the accuracy and reliability of AI-generated data may be affected by biases and knowledge gaps in the training data or the complexity of the query, for example, when asking to produce code for more complex biological systems.16, 17 ChatGPT often appears very confident in its responses. However, when responding with queries such as ‘there was an error in the code’, it is quick to acknowledge the previous response as incorrect, even if it was not. Additionally, the lack of transparency and interpretability of AI algorithms may raise ethical concerns and limit their widespread adoption.18, 19
Another limitation to consider is that in our tests, ChatGPT was unable to generate functional NONMEM control streams. This was unfortunate as NONMEM is considered to be the gold standard in pharmacometrics research, and the identification of errors in these streams can greatly support students learning to use it.20 This may be due to the limited availability of publicly available control streams, making it difficult for ChatGPT to learn from and generate accurate and reliable code for NONMEM models.
In conclusion, the integration of ChatGPT in pharmacometrics has the potential to streamline the development process and improve the user experience for pharmacometrics researchers. We deem it unlikely that ChatGPT will replace pharmacometricians in its current state. ChatGPT does have great value with respect to aiding researchers in finding and explaining information, generating and helping to debug code and the education of new generations of pharmacometricians. As ChatGPT continues to evolve and improve, it has the potential to become an even more valuable tool in the field of pharmacometrics. It is likely that other pharmacometricians will find new and innovative ways to integrate it into their workflows and further enhance its capabilities in the field of pharmacometrics.
4.1 Nomenclature of targets and ligands
Key protein targets and ligands in this article are hyperlinked to corresponding entries in http://www.guidetopharmacology.org and are permanently archived in the Concise Guide to PHARMACOLOGY 2019/20.21
AUTHOR CONTRIBUTIONS
Michael E. Cloesmeijer performed the analysis. Michael E. Cloesmeijer, Alexander Janssen and Sjoerd F. Koopman wrote the manuscript. All authors contributed substantially to the critical revision of the manuscript and approved the final draft.
ACKNOWLEDGMENTS
The SYMPHONY consortium aims to orchestrate personalized treatment in patients with bleeding disorders and is a unique collaboration between patients, healthcare professionals and translational and fundamental researchers specialized in inherited bleeding disorders, as well as experts from multiple disciplines. It aims to identify best treatment choice for each individual based on bleeding phenotype. In order to achieve this goal, work packages have been organized according to three themes, for example, Diagnostics (workpackage 3 and 4), Treatment (workpackages 5–9) and Fundamental Research (workpackages 10–12). This research received funding from the Netherlands Organization for Scientific Research (NWO) in the framework of the NWA-ORC Call grant agreement NWA.1160.18.038. Principal investigator: Dr M.H. Cnossen; project coordinator: Dr S.H. Reitsma.
Beneficiaries of the SYMPHONY consortium: Erasmus University Medical Center-Sophia Children's Hospital, project leadership and coordination; Sanquin Diagnostics; Sanquin Research; Amsterdam University Medical Centers; University Medical Center Groningen; University Medical Center Utrecht; Leiden University Medical Center; Radboud University Medical Center; Netherlands Society of Hemophilia Patients (NVHP); Netherlands Society for Thrombosis and Hemostasis (NVTH); Bayer B.V., CSL Behring B.V., Swedish Orphan Biovitrum (Belgium) BVBA/SPRL.
CONFLICT OF INTEREST STATEMENT
M.H.C.'s institution has received investigator-initiated research and travel grants as well as speaker fees over the years from the Netherlands Organization for Scientific Research (NWO) and Netherlands National research Agenda (NWA), the Netherlands Organization for Health Research and Development (ZonMw), the Dutch Innovatiefonds Zorgverzekeraars, Baxter/Baxalta/Shire/Takeda, Pfizer, Bayer Schering Pharma, CSL Behring, Sobi Biogen, Novo Nordisk, Novartis and Nordic Pharma and for serving as a steering board member for Roche, Bayer and Novartis for which fees go to the Erasmus MC as an institution. R.A.A.M. has received grants from governmental and societal research institutes such as NWO, ZonMW, Dutch Kidney Foundation and Innovation Fund and unrestricted investigator research grants from Baxter/Baxalta/ Shire/Takeda, Bayer, CSL Behring, Sobi and CelltrionHC. He has served as advisor for Bayer, CSL Behring, Merck Sharp & Dohme and Baxter/Baxalta/Shire/Takeda. All grants and fees paid to the institution. Other authors have no conflict of interest to declare for this paper.
Open Research
DATA AVAILABILITY STATEMENT
Not applicable.