Volume 176, Issue 21 p. 4107-4118
REVIEW ARTICLE
Free Access

Sex bias in preclinical research and an exploration of how to change the status quo

Natasha A Karp

Corresponding Author

Natasha A Karp

Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Cambridge, UK

Correspondence Natasha A. Karp, Quantitative Biology, Discovery Science IMED Biotech Unit, AstraZeneca Darwin Building (Unit 310), Cambridge Science Park, Cambridge CB4 0WG, UK. E-mail: [email protected]Search for more papers by this author
Neil Reavey

Neil Reavey

Council for Science and Animal Welfare, AstraZeneca, Cambridge, UK

Drug Safety and Metabolism, IMED Biotech Unit, AstraZeneca, Cambridge, UK

Search for more papers by this author
First published: 12 November 2018
Citations: 79
This article is part of a themed section on The Importance of Sex Differences in Pharmacology Research. To view the other articles in this setion visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v176.21/issuetoc

Abstract

There has been a revolution within clinical trials to include females in the research pipeline. However, there has been limited change in the preclinical arena; yet the research here lays the ground work for the subsequent clinical trials. Sex bias has been highlighted as one of the contributing factors to the poor translation and replicability issues undermining preclinical research. There have been multiple calls for action, and the funders of biomedical research are actively pushing the inclusion of sex as a biological variable. Here, we consider the current standard practice within the preclinical research setting, why there is a movement to include females and why the imbalance exists. We explore organizational change theory as a tool to shape strategies needed at an individual and institute level to change the status quo. The ultimate goal is to create a scientific environment in which our preclinical research automatically implements sex-sensitive approaches.

LINKED ARTICLES

This article is part of a themed section on The Importance of Sex Differences in Pharmacology Research. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v176.21/issuetoc

Abbreviations

  • 3Rs
  • Replacement, Reduction and Refinement
  • ADR
  • adverse drug reaction
  • IMPC
  • International Mouse Phenotyping Consortium
  • NIH
  • National Institute of Health
  • Sex bias in preclinical research

    An endemic imbalance

    Within preclinical research, an endemic persistent sex bias is present which predominately focuses on male animals (Beery and Zucker, 2011). Looking across 10 fields of biological research, it has been shown that 8 out of 10 had a male bias during the experiments, single-sex studies of male animals outnumbering those of females 5.5 to 1, and in six fields, 80% of the studies were only on male rodents (Beery and Zucker, 2011). This bias is a persistent product of our research pipeline as it has not changed in a 20 year period (Mazure and Jones, 2015). Even in the situation where the disease of interest is a female-prevalent disorder, a male bias is apparent. Yoon et al. found that in publications studying disorders prevalent in women, only 12% studied females or both sexes (Yoon et al., 2014).

    Sex bias is not just restricted to in vivo studies. For in vitro studies, the relevance of sex has traditionally been discounted; as demonstrated by the observation that the majority of studies involving newly generated cell, cells failed to specify the sex (Taylor et al., 2011) and when, sex is reported, 71% studied only males (Yoon et al., 2014). In the pipeline of cell providers, the majority are sold without defining the sex (Lee, 2018).

    It has been argued that the scientific community would know when sex was a significant source of variation and should be trusted (Fields, 2014). An outcome of the historic sex bias means we have a ‘knowledge gap’ (Johnson et al., 2014). Consequently, just because the community always pools data or study one sex does not mean it is an informed decision. Research from the International Mouse Phenotyping Consortium (IMPC) looking at data from 10 institutes, 14 thousand wildtype mice, 40 thousand knockout mice for 234 traits found that sex was a significant source of variation within control data and as a modifier of a treatment effect (Karp et al., 2017). This study also highlighted that sexually dimorphic differences were typically differences in the size of the effect rather than an effect that is specific to one sex. It has been argued that it is important to avoid language around sex differences of better or improved or worse as it implies one of the sexes is the norm and we should instead be more objective using terms such as greater, less, higher or lower to describe how a sex differences depends on a sex (McCullough et al., 2014). The scale of the IMPC study, observing sexually dimorphic effects across a large number of biological systems, highlights that our default position should be to automatically study both sexes and account for sex as a source of variation.

    Here, we will discuss the impact of this imbalance and the steps being taken to address the bias. The resistance to change will be explored at a cultural, institute and individual level through application of a number of change management theories. We have limited the exploration to a focus on an individual research institute and have not touched upon the wider scientific culture which is influenced by journals, professional bodies, etc. Resistant forces throughout the research pipeline, from experimental design through analysis to reporting, will be considered. Strategies to address the resistance and support successful transition will then be suggested.

    The risks arising from the imbalance

    It is widely accepted that women and men respond differently to a disease during progression, severity, treatment efficacy and risk of side effects (Yoon et al., 2014). Furthermore, numerous examples have highlighted that the sexes metabolize drugs differently and efficacy can depend on sex (Anderson, 2005). A prominent example of the effect of the imbalance during preclinical research involves zolpidem, a treatment for insomnia. Twenty years after the drug was released to market, the US Food and Drug Administration issued an alert that the dosing requirement for women needed to be halved because of the risk to women from significant next morning impairment (Zakiniaeiz et al., 2016). This adjustment of dosing to appropriately manage health risks in women arose because of sex differences in clearance rates that would have been detected if pharmacokinetics studies had been carried out in both male and female animals prior to clinical trials (Zakiniaeiz et al., 2016). This is not an isolated incident; 8 out of 10 drugs withdrawn from the US market between 1997 and 2000 were due to significant health risks for women (Heinrich et al., 2001). Female patients have a significantly higher adverse drug reaction (ADR) risk compared to males (Zopf et al., 2008) and the effects are typically more severe in women than in men (Franconi et al., 2017). A review of drug applications found that when the analysis was stratified by sex, 6–7% of the studies found >40% differences in pharmacokinetics between males and females (Anderson, 2005). Anderson explored the issue and felt that whether dosing females based on their pharmacokinetics would address ADR is unknown and further research of the role of sex was critical. Initially, it was thought that the differences in toxicity was predominately due to difference in body size, body fat or a difference in reporting rates of concerns. However, it is becoming apparent that the source of sex differences in toxicity is far more complex and needs exploration in the research pipeline.

    The failure to acknowledge a potential sex difference has been highlighted as one of the possible contributing factors to the failure of replication undermining science and the translational crisis which questions the value of the preclinical research (Collins and Tabak, 2014). Sex bias can contribute to deficiencies in replication if researchers fail to report the sex used, or if they allow sex to be an uncontrolled variable during the experiment, or if they fail to account for sex during the analysis and sex is a significant source of variation. In a study considering the low success rate in clinical trial of treatments for motor neuron disease, the authors demonstrated the importance of controlling confounders such as sex for reproducible and translatable research (Scott et al., 2008).

    Failing to consider the role of sex is also a missed opportunity. Precision medicine is an emerging approach to disease treatment and prevention, which is characterized by developing programmes of medical care in which the treatments account for an aspect in which individuals vary. Typically, the types of variability being discussed have focused on differences in genetic factors, environment or lifestyle. Optimizing medicine based on the sex of the individual is an easily identifiable adjustment to a treatment plan. With 50% of us being one sex or another, focusing on this variable could have a significant impact on health care.

    The call to address the imbalance

    The sex bias in preclinical research has been discussed for over 20 years and yet little progress has been made (Mazure and Jones, 2015). This is despite numerous initiatives to encourage integration of sex in the preclinical pipeline (Lee, 2018). This is why the US National Institute of Health (NIH) announced a multidimensional approach including both a requirement to change practice and support to enable the change (Clayton and Collins, 2014). The NIH now require applicants to report their plans for including sex as a biological variable in studies involving animals or cell lines, ‘unless sex-specific inclusion is unwarranted based on rigorously defined exceptions’ (Clayton and Collins, 2014). The exceptions include purely molecular studies, such as. protein–protein interactions, studies on sex specific conditions or phenomena, such as testicular cancer, studies involving acutely scarce resources, such as studies involving non-human primates and, finally, if strong justification can be provided. It was noted that the absence of evidence regarding a sex differences is not suitable justification.

    As the NIH are the largest funders of biomedical research in the world (Moses et al., 2015) and it was anticipated that this initiative would lead to a shift in research practice. The NIH are supporting this change by collaborating with stakeholders on the development and implementation of the policies, with the development of training modules, and by monitoring the effects with ongoing data analysis (Clayton and Collins, 2014). It is important to note that the requirement is not to design experiments that can identify sex differences in the response but rather to collect both sexes, to analyse the data accounting for sex as a source of variation and to visualize and summarize the data by sex. Then, large differences, if they exist, will be identifiable.

    A similar initiative by the Canadian Institute of Health, with mandatory questions around sex and gender during the research funding application process, led to a substantial increase in the proportion of applications considering both sexes in the proposal (26 to 48%) (Johnson et al., 2014). They did note that biomedical researchers were least likely to report that they had accounted for sex in their studies. These findings highlight the opportunities for policy interventions to address the sex bias. However, not all research is funded through these platforms and automatic inclusion of two sexes within our research pipelines has yet to be achieved.

    Resistance to change

    Based on the data and arguments to date, sex bias should be a historic issue rather than an on-going standard practice. The scientific pipeline of manuscripts and presentations on the topic of sex bias will lead individual researchers to understand the concepts and typically they will agree with the philosophy. However, the researchers will return to their desk or bench and continue doing things as they have always done. Implementing change so that it embeds in the culture and results in long lasting change in behaviour is difficult and the journey is easily derailed. In this context, we need to consider the application of change theory to enable successful transition of behaviours and actions within our organizations. Change is not a management issue, rather it is a leadership issue; as it is leaders who build new systems or transform old ones (Kotter, 2012).

    Organizational change theories role in addressing sex bias

    For a change initiative to be successful, we are looking to shape culture within an institute in order to drive changes in practice. Institutional culture arises from the experiences of individuals and the relationship they have with the institute, its policies, processes and history. We therefore need to consider change theories that explore the issues at three levels: cultural, institutional and individual. By applying a variety of change theories, we can begin to understand the barriers and where it is necessary to focus resources to drive change.

    Cultural considerations

    At its most basic level, an institute's culture might be described as ‘how we do things around here’ and while this appears to be a cliché, it speaks to the notion that individual behaviour is a critical influencing factor. The institute can begin to shape its culture through the communication of its values, the hierarchy, governance and team structure. However, cultural foundations are often rooted in the stories, routines and rituals that individuals strongly associate themselves with. For an institute to move away from the current cultural paradigm to one that is more aligned with a new strategic vision, it must address these less tangible aspects by focusing as much on supportive change for an individual as it does for structure, policy and process (Johnson et al., 2011). The challenge of achieving deep and lasting change is described by (Schein, 2010) in three layers. The first layer ‘visible artefacts’ offers the most tangible examples of where an institute can initiate change, such as a visible switch in operating model, position statements or published research. The second layer ‘espoused values’ describes the institute in less tangible terms. This might be the institute's vision, mission and strategic position. On the surface, these are easy to change. There can however be a disconnect between these espoused values and the behaviours of people in an institute if mid to senior level leaders do not perform as effective role models. The most challenging cultural layer to modify is described as the ‘underlying assumptions’. These are held by individuals and are influenced by a history of learned experience in an institute. This layer is primarily responsible for the day to day behaviour of an individual and is self-evident in their relationship with the institute. It can be argued that cultural change is at its most effective when there is transaction between an individual's underlying assumptions and the changes being made at the visible artefact and espoused value levels of culture. The key for individuals learning and adopting new behaviour is to unlearn what they consciously, and more importantly unconsciously, know and believe. The approaches outlined in the following sections address some of the cultural considerations described here. However, it should be noted that persistent and committed effort is needed at all levels to drive belief in the change and to help individuals modify their unconscious bias (Senge, 2014).

    Considerations for the institute

    At an institute level, a force field analysis (Lewin, 1946), a structured decision making technique, is an effective technique for analysing the forces for and against the change. It is a powerful tool that is used to visualize the forces maintaining the status quo. To enable change, we need to strengthen driving forces and weaken resistant forces (Figure 1). Lewin, in his three-stage theory of change, describes this as unfreezing the existing equilibrium, moving towards the desired scenario and finally establishing a new equilibrium (Lewin, 1947). In our application of the force field analysis, we have qualitatively and subjectively attributed scores. However, the technique can be employed with an in-depth analysis generating calibrated scores. The qualitative approach is effective to explore the forces that underpin the status quo both for and against change, thus illustrating why initiatives to date have not led to the desired change (Figure 1). From this, we can identify a targeted approach that weakens the resisting forces and strengthens the driving forces to unfreeze the equilibrium, allowing a new paradigm to emerge. It demonstrates that a multilayered strategy is critical. The outcome of the force field analysis supports leaders to develop change strategies that focus on the most influential forces, enabling a more expedient and successful transition.

    Details are in the caption following the image
    An example of force field analysis. A typical force field analysis for an imaginary institute demonstrating the interplay of forces maintaining the current status quo of continuing to study only one sex. The numbers 1 to 4 indicate the strength of that force where 1 is weaker and 4 strong. The qualitative values are institute-dependent and are determined by reflection within an institute of the pressures being felt. For many institutes, the concept of sex as a biological variable is absent from the communication from the leaders and not considered by the ethical review bodies. In these situations, the reason the imbalance has not changed is obvious as the pressures to change are overwhelmed by the resistance. As shown in the diagram in light grey, the addition of forces for change could significantly overcome the resistance and start the path towards better practice. Furthermore, forces against change could be weakened. For example, training and analysis templates would weaken the resistance arising from lack of statistical knowledge.

    A world renowned change expert, Prof John Kotter, researched the reasons why change initiatives frequently failed and developed an eight-step change model to implement change successfully (Kotter, 2012). The eight-step process provides a structured framework and has a focus to ensure the change is not derailed and becomes embedded into the culture of the organization. Steps 1 to 4 require significant input from senior leadership, with a focus on creating the case for change and gaining commitment from stakeholders. Steps 5 to 7 still involve senior leadership, but they also need middle management to implement new practice. Finally, step eight is concerned with all stakeholders embedding the changes in the organization's culture, to make new practice stick.

    The first step is the need to create urgency, which has been described as the need to construct a burning platform where it is critical to embrace the change and this will spark motivation. At this stage, it is important to highlight the threats and potential outcomes of maintaining the status quo and to promote the opportunities of including sex in the preclinical research pipeline in order to create the impetus for change. The second step is forming a powerful coalition, which needs to include influential people from all levels of the organization. This is necessary, as this level of change will require strong leadership and visible support. A credible coalition will need to include senior management, principal investigators, head of the animal facilities, chair of the ethical review bodies, etc., to bring the key stakeholders and leaders together to drive the change. The third step requires the coalition to develop a clear vision of what needs to change and why. An effective vision will convey a picture of the future, will be desirable, feasible, communicable and focused, allowing guidance on everyday decision making. An example could be ‘To effectively translate our research, we will automatically implement sex-sensitive approaches in study design, analysis and data presentation’.

    Once this vision is developed, step four focuses on the communication of the vision in a manner that embeds the concept within the institute. This means that an announcement or a lecture is not going to drive change alone. Instead, there must be a concerted effort to sell the concept alongside a persistent demonstration of the behaviours and actions required to change culture. This provides a degree of authenticity that individuals can invest in and this drives belief in the concept. Kotter estimated that if a change initiative was communicated with a 30 min speech, a 1 h long meeting, 600-word article in an institute's communication and a 2000-word memo, this would only amount to 0.6% of the total communication an employee would receive in a 3 month period (Kotter, 2012). Consequently, the vision needs to be simple without jargon and communicated using metaphors, multiple forums, with high repetition and interactively. Kotter's observation that for a change initiative to be successful, 75% of stakeholders will need to ‘buy-in’ to the vision (Kotter, 2012), highlights the importance of these early steps to ensure this level of buy-in is achieved across the institute.

    At step five, we need to identify the obstacles and develop strategies to remove or overcome them to enable people to feel empowered to execute the vision. We will discuss the obstacles in more detail later within this manuscript when we focus on each area of the preclinical research pipeline. Kotter, in step six, highlights the need to create short-term wins by breaking the change into incremental steps which can be publicized and highlighted to generate excitement, conversation and success points along the way. Examples include consultation feasibility studies, trial results of running two sexes or developing analysis pipelines. This will fine-tune the strategy as it moves forward and it will maintain momentum and undermine cynics by the concrete feedback about the validity of the vision.

    To maintain the momentum, step seven, the requirement to build on the change, is necessary to avoid failing because victory is declared too early. As resistance can reassert itself, there is a need to maintain clarity of vision and a sense of urgency. The final step, anchoring change in institute culture, is necessary to ensure it becomes a core behaviour and therefore the automatic strategy. In this step, we need to recognize the journey we have travelled and celebrate and publish the successes and ensure we maintain a team of leaders who prioritize the study of two sexes in our research. It is acknowledged that, although this is presented as an eight-step process, the initial stages are sequential but then later steps will be running concurrently while the earlier stages will need to be continually reinforced.

    Considerations for the individual

    Organizations do not change because of new systems or processes, instead to change the status quo requires individuals to buy in to a new direction that aligns with their values and beliefs. To understand an individual's emotional reaction to change, we can consider the change curve model. This curve can help visualize the process that each individual navigates (Figure 2) (Kearney and Hyle, 2003). When managing change, we are looking for tactics that enable individuals to move through the curve more quickly. Gary Yukl has highlighted that there are nine main tactics (Table 1) for implementing change and the reaction to any of these can be resistance, reluctant compliance or commitment (Yukl and Chavez, 2002). Without addressing the unconscious bias of the individual, an initiative will often only achieve reluctant compliance. Unfortunately, reluctant compliance is change that is not embedded and as attention drifts, then compliance decreases and behaviour reverts to the historic approaches. Yukl's research highlighted that the most commonly used tactic within an organization (pressure and rational persuasion) resulted in the greatest resistance whereas the least used tactics (consultation and inspirational appeal) resulted in the greatest level of commitment. Ironically, our main communication strategies as scientists (presentations and manuscripts) are based around the concept of rational persuasion, which might help explain the lack of progress to date. The other commonly used strategy within science of checklists could be classified as a pressure strategy which again leads to resistance rather than commitment.

    Details are in the caption following the image
    Change transition curve. A visualization of the change transition curve to highlight the natural emotional reaction to change.
    Table 1. Common influencing tactics to assist people through the change transition curve (Yukl and Chavez, 2002)
    Tactic Description
    Coalition building Enlisting the help or endorsement of others to generate a network of supporter, build consensus and defining a group position. This will give the change weight and momentum.
    Consultation Seeking the participation and input of others in developing a course of action to achieve the goal.
    Exchange This tactic is based on reciprocity and involves rewarding others for their help or involvement.
    Ingratiation/socializing The use of praise and flattery before or during an attempt to get others to comply with what is requested of them or to support the proposal.
    Inspirational appeal Appealing to a person's emotions, values, aspirations and ideals. For example, in our case, emphasizing the potential impact on women's health.
    Legitimizing Using authority or credentials, for example, showing that the request is consistent with policy, procedure or tradition.
    Pressure The use of consequences to force others to do what you want.
    Personal appeal In this tactic, a person is asked to do something because of friendship/relationship or loyalty. It risks that someone can feel manipulated or taken advantage of.
    Rational persuasion The use of logical arguments, facts and evidence. Brainstorm possible objections ahead of time.

    Understanding that the natural reaction to change is resistance and that people can progress at different rates through the change curve highlights the need to invest appropriately as an institute. An institute will need to give both time and resources to enable the change and ensure the individuals that lead the company are taken on the journey to enable a successful outcome. For example, if senior management have been discussing the strategy for a number of months, they may have completed their transition through the change curve but for the staff on the ground, this may be the first they may have heard of the change and these individuals are at the beginning of the change cycle. Understanding this profile means we should approach the discussions on changing our practices with an expectation of resistance. This knowledge allows us to ride this initial emotional reaction and reminds us that we need to listen carefully to unpick from the individuals what are the real blockers rather than the arguments presented due to emotional resistance to moving forward.

    Exploring resistant forces in the research setting

    As the imbalance is threaded throughout the research pipeline and can be seen in the design, analysis and reporting of studies, we can focus on each of these areas to identifying the resistant forces and propose solutions to weaken the resistance. Within Kotter's framework, this would be the obstacles that would be identified in step five and then strategies developed to minimize their effect. These solutions need to be strategies that utilize Yukl's tactics (Yukl and Chavez, 2002), with a focus on those that result in the greatest level of commitment.

    Resistance arising in the experimental design of in vivo studies

    How scientists approach experimental design of in vivo studies is very much driven by the Replacement, Reduction and Refinement (3Rs) framework (Russell and Burch, 1959). The reduction element, with a focus on using the fewest number of animals, has led to an environment where we explore the science in a very narrow testing space and then extrapolate the findings to a wider scenario. It is typical to study one genetic background, one laboratory, one age and one sex and to assume subsequently in the interpretation that the results will be generalizable. Resistance to the use of females has arisen, because of the widely accepted belief that female animals are more variable due to the oestrous cycle, unless this is controlled within an experiment (McCarthy, 2015). Reducing variability has been a large driver in the design of the experiments, with the focus on ensuring sensitivity to detect a change of interest while using the smallest number of animals. A meta-analysis of 293 neuroscience articles, looking at 9932 traits, found the variability in females were no greater than for males and in some cases were less (Prendergast et al., 2014). Scientists have argued that there are other variables (such as age) that are more critical and that a researcher should be allowed to prioritize appropriately (McCarthy, 2015). However, sex accounts for half the population at any age and is a critical player in evolutionary pressure as sexual conflict, where selection acts in opposing directions on males and females, maintains genetic diversity (Mank, 2017).

    Our ethical responsibilities are thus driving designs which conduct the research in a narrow testing space with the assumption that we can nevertheless generalize the results. This resisting force is now being questioned. The replication crisis, arising from the failures to replicate earlier published work, has been attributed to many elements throughout the research pipeline but includes a lack of consideration of sex differences (Collins and Tabak, 2014). The replication crisis has led to a refinement of the definition of the reduction element of the 3Rs by the National Centre for the Replacement Refinement and Reduction of Animals in Research (NC3Rs) to ‘appropriately designed and analyzed animal experiments that are robust and reproducible, and truly add to the knowledge base’ (NC3Rs, 2018). The need to update our thinking is reflected in this new definition for reduction, which frames the use of animals not within a single experiment but more globally thus reducing the resisting force to the use of both sexes from an ethical perspective. This, however, is a huge change in perspective for the pipeline that manages in vivo research and not only includes the individual scientists but also the government review bodies, such as the UK Home Office, and institutional ethical review bodies. To date, ethical review bodies have focused on minimizing harm and typically assume the validity and replicability of the studies are met (Vogt et al., 2016). Within an institute, the ethical review body can be a powerful component and can shape the designs run within an institute and the culture around the default position on sex. A fragmented governance structure, common in large institutes, would make decision-making on a change in approach very challenging. The ethical review body, however, has the potential position, with the appropriate empowerment, to drive change where this fragmented structure hinders decision making.

    There are also operational resistant forces at play. At an individual experiment level, practical reasons might arise such that the inclusion of both sexes makes things more complex. While some increase in complexity can be tolerated, too much can lead to mistakes. The complexity arises from practical constraints and managing the welfare of the animals. Consider scent (including pheromones). Rodents use scent marking as a communication strategy. Familiar scent markings reduce anxiety, and scent of the opposite sex can lead to unwelcome changes in the animals that can lead to anxiety and or changes in experimental output (Hurst, 2005). Within the laboratory, this is often managed by cleaning equipment between studies on the opposing sex. However, for some equipment (e.g. rotarod with a textured surface), it is very difficult to remove the pheromones and then the decision is made either to study animals in the presence of the scent or use separate equipment for the sexes. Neither option is ideal, in the latter, the effects of sex on the data are confounded by equipment. A similar situation will arise when working with open cages as each sex will need to be housed a separate room within the animal house. Consequently, the effect of sex in this situation is confounded by room. To avoid potential temporal effects biasing the data, an experiment should also collect the data randomized by time. If we study both sexes, we then switch not only between control and treated animals with time but also between the two sexes. This has increased the complexity of the experiment risking mistakes, and the cleaning of equipment will significantly add to the workload.

    In an experiment involving living animals, we are looking to meet our ethical obligations and ensuring the optimum welfare of the animals while also achieving a robust design. The majority of in vivo studies work with animals that are co-housing by sex as a form of environmental enrichment. This leads to another constraint in that we typically process one cage at a time to avoid prolonging the period of time the animals are disturbed and stressed. However, this approach potentially limits the implementation of randomization with time. This limitation will allow confounders to potentially affect our experimental data. By studying both sexes, we further increase the complexity of the experiment and further increase the potential for confounders to affect our studies. These practical limitations mean a ‘perfect’ experiment cannot be run. This is the reality of in vivo studies, and we must work to minimize the flaws and then acknowledge the potential weaknesses. From a distance, changing an experiment to study both sexes looks straightforward. In reality, the situation is more complex and time is needed to discuss these constraints in order to identify solutions.

    Significant resistance to studying both sexes may also have arisen from the erroneous belief that to do so would double the sample size and therefore be prohibitively expensive. The doubling of sample size is a misconception as the following argument demonstrates. If we are studying a treatment effect (such as tumour growth rate) and add sex as a second variable, we are moving from a complete randomized design typically analysed with a Student's t-test to assess for statistical significance to a factorial design where a two-way ANOVA would be appropriate. Factorial designs are recommended as these are more efficient (Shaw et al., 2002). The difference in sensitivity arises from a fundamental difference in the statistical analysis implemented. In a classic complete randomized design, there is a direct comparison of a small number of individual experimental conditions. In a factorial design, the estimation of main effects and interactions through an ANOVA means individual conditions are never directly compared. Instead, the main effects and interaction estimations are determined by comparison of combination of experimental conditions. Therefore, the power of a factorial design does not come from the individual conditions but rather is the sample size across all experimental conditions. Another way of assessing the power is to consider the degrees of freedoms, where the degrees of freedom are the number of values in the final calculation of a statistics that are free to vary which correlates with power. With a Student's t-test, the analysis only estimates one element (the treatment effect) and therefore uses one degree of freedom. With a two-way ANOVA, the analysis is now estimating three elements (treatment, sex and an interaction of treatment with sex effect) and uses three degrees of freedom. Therefore, in the factorial design, you gain a lot more information from the study but are effectively only losing 2 degrees of freedom from the sensitivity of the test. This is why McCarthy in 2015 suggested that you mirror your original design but change half the animals in your study to female (McCarthy, 2015). In a worked example, McCarthy proposed that if your original study comparing a treatment to a control with one sex had 12 animals (six animals per treatment group), you should instead study 16 animals (four animals per sex per treatment group) (McCarthy, 2015). How we count animal usage also plays a role. Within a single experiment, the number of animals has slightly increased in the case study presented by McCarthy. If we think more holistically about the number of animals, to provide animals of only one sex means many animals during the breeding stage are not used and are culled. Furthermore, if we use both sexes, we have the potential to reduce the colony size needed to generate the animals for our studies.

    During the experimental design stage of research, we have identified significant sources of resistance that include ingrained beliefs that the current approach is essential to meet our ethical obligations and practical obstacles. A survey assessing researchers' views on scientific rigour found that many scientists were not following recommended practice because they did not perceive it was relevant to their situation (Reichlin et al., 2016). A common resistant force is the ‘well we have always done it this way’ with the implicit concept that it has been alright so far. Significant resources focusing on challenging this belief will be needed to strengthen the burning platform, critical as a starting point for change.

    Resistance arising in the statistical analysis of in vivo studies

    When studies do incorporate two sexes in the design, it has been observed that the subsequent analysis typically pools the data assuming there are no differences between the sexes either in the variable of interest or in the effect of the treatment (Beery and Zucker, 2011). In a classic two-group comparison, a Student's t-test will test the hypothesis that there is no difference in group means. This test will be assuming that each reading is independent and comes from one homogenous population. If sex is a significant source of variation, then the data are being drawn from two separate populations. At this point, the assumptions of the t-test are not being met and the ability to identify and assess significance of the treatment effect is flawed. To demonstrate that pooling data can obscure the true biological relationship, data were artificially generated using R (a statistical programming language) where the treatment effect depended on the sex of the animals. These data were then used to illustrate the effect of collapsing the data across the sexes on both the statistical and visual outcome (Figure 3).

    Details are in the caption following the image
    Effects of pooling data when sex has significant effect on treatment outcome. Data were artificially generated to demonstrate the effects of pooling when studying a system where the treatment effect does depend on the sex of the animal. (i) When data were pooled, visually and statistical (P = 0.3716, two-sided Student's t-test), no treatment effect was apparent. (ii) When graphed by sex (red = female, cyan = male), we can see a significant treatment effect for the females (P = 0.0193, F test from a two-way ANOVA detecting 1.1 ± 0.4 standard error effect). Data were generated in R (a statistical programming language), by random samples from a normal distribution to give five animals per sex per treatment group. A mean signal of 5 units was set for males and 5.5 for females. All groups were set to have a SD of 1 unit. A treatment effect of 1 unit was specified to occur but only for females.

    The pooling of data could reflect a position where investigators are assuming that sex differences are not significant or it could reflect a skill gap, in that investigators avoid more complex analysis because they do not know how to handle them. Kotter identified that a skill shortage is a common blocker leading to resistance in a change initiative (Kotter, 2012). Several reviews have established that statistical errors are common in science publications (Weissgerber et al., 2016). Culturally, statistics support is often sourced at the end of a study and as Ronald Fisher, a famous statistician who has been described as ‘a genius who almost single-handedly created the foundations for modern statistical science’ (Hald, 1998), said ‘To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of’. (Fisher, 1938). As a community, we need to plan the analysis, prior to the collection of the data. This will also address the issue of ‘p-hacking’ or ‘data-dredging’ which is the process of amending your statistical plan until statistical significance is obtained which is recognized to contribute to the replication crisis (Head et al., 2015). The importance of planning is well established yet culturally within science, the focus is typically on data collection and the analysis is expected to be sorted later. Several approaches exist that can help raise the standard of experimental planning. For example, the Experimental Design Assistant (Percie du Sert et al., 2017), a free-to-use web application, constructs a visual representation of the experiment and uses computer-based logical reasoning to provide feedback and advice on the experimental plan and includes the analysis. An alternative is a documented review of the experiment (i.e. a proforma) which considers key design principles and links the goals, design and analysis. This strategy was implemented across AstraZeneca for all in vivo studies to ensure detailed planning is carried out prior to the onset of experimentation and a systematic review found greater confidence in the studies being robust (Peers et al., 2014).

    A challenge with changing scientific practice, such as how we analyse data, is that it implicitly criticizes the way things have been done. Furthermore, changing practice places people outside their comfort zone and highlights skill shortages. Together, these issues lead to resistance. Understanding the source of resistance can help develop strategies to be applied in the step five of the Kotter's eight-step process. A consultation strategy, recommended by Yukl (Yukl and Chavez, 2002) as an effective strategy leading to commitment, is needed to explore within a department how data could be analysed and presented. Generating good practice guidelines, example layouts, identifying or building easy to use tools to support their analysis breaks down the barriers and enables change to be implemented. It also legitimizes the new approach to their data analysis.

    The primary strategy being proposed to address the skill gap in data analysis is statistical training. This training needs to not only teach scientists the basic skills but to recognize when they need to ask for help. Within an institute, provision of alternative statistical expertise, such as a consulting statistician is needed. It is important that it is understood that a statistician plays a complementary role and success will be achieved through a collaboration between statistician and investigator. A key competency of ethical review bodies (e.g. Animal Welfare Ethical Review Body or Institutional Animal Care and Use Committee) is statistics and experimental design which has resulted in many recruiting a statistician to the team (RSPCA and LASA, 2015). However, addressing the technical skills shortage alone will not be effective due to the cultural aspects around statistics. Culturally, in many countries, it is acceptable to say that one hates mathematics and research has found the strongest predictor of attitude towards statistics had been how well an individual had performed in mathematics in the past (Hannigan et al., 2014). There is growing evidence that the attitudes towards statistics as an individual embarks on statistical training contributes significantly to outcome and whether the training is effective (Onwuegbuzie, 2003). It is ironic that while it is generally acknowledged that statistics is hard, as a community, we often give insufficient time and resources to the analysis phase, focusing our time on collecting the data. This suggests a lack of value associated with statistical analysis, possibly arising from the lack of understanding of what it can bring (Gigerenzer, 2004). The lack of value placed on statistics has been noted in, for example, medical students who reported neutral perceptions about the value of biostatistics and their interest in statistics (Stanisavljevic et al., 2014). Training alone is therefore not going to address the issue, we need a change in mind set. This does not just affect this question but also the crisis of replication within science. Significant promotion of the value of statistics from leadership teams is critical for any progress to be made.

    Resistance arising in the reporting of in vivo studies

    Poor quality of the reporting of in vivo experiments has been highlighted as a critical issue in addressing the replication of in vivo research (Kilkenny et al., 2009). Reporting issues have been raised in describing the experiment, in how the data were analysed and in the discussion of the results (Kilkenny et al., 2009). In regard to the sex of the animals, surveys of publications have found that between 22 and 26% of papers did not report the sex of animals used within the main study of the paper (Kilkenny et al., 2009; Yoon et al., 2014). The concerns over reporting standards have led to the development of the ARRIVE guidelines, a checklist of 20 items describing the minimum information that all scientific publications should include and the specification of the sex of the animals used is explicitly requested (Kilkenny et al., 2010). The guidelines were well received by the community as shown by over 1000 journals (May 2018) endorsing the guidelines in their manuscript preparation guidelines. Research has found mixed results on the outcomes of the guidelines on the quality of reporting (Baker et al., 2014; Flórez-Vargas et al., 2016; Sena, 2017). However, reporting rates of key strategies to improve experiments are improving (Macleod et al., 2015). Knowledge of the ARRIVE guidelines has shown to have a positive effect on scientific rigour but within the community surveyed more than half had never heard of the guidelines (Reichlin et al., 2016). It is important to note that the ARRIVE guidelines only encourage the specification of the sex of the animals and as yet make no mention of the analysis. Not only do we need to report the sex studied, we need to present the data by sex and incorporate this within the reported analysis.

    Raising awareness of ARRIVE within an institute would therefore improve the quality of reporting globally which would indirectly improve the reporting around the sex of the animals within a study. If we reflect on the observation (Yukl and Chavez, 2002) that rational persuasion is a strategy which leads to resistance, then the classic scientific approach of scientific talks or manuscripts will be ineffective by itself. Instead, we will need to wrap these approaches with a multi-prong approach drawing on alternate strategies (such as legitimizing or inspirational appeals) to lead to commitment and therefore a successful transition. Engaging with the leadership teams (research groups, ethical review groups, senior management) to endorse the guidelines can give weight and momentum to drive change enabling the expectation to become embedded within an institute's culture. Finally, implementing a system which allows scientists to provide assurance that they have considered the ARRIVE guidelines in their manuscript preparation. This positively reinforces the value the institute places on ARRIVE and the expectations the institute has of its scientists.

    Resistance arising for in vitro studies

    Many of the resisting forces discussed with in vivo studies will be equally relevant to in vitro studies. For example, the resistance arising from anxieties on how to analyse and present data will be a common issue for both types of study. In other areas, studying both sexes will be less challenging as the welfare constraints seen with in vivo studies are removed. The major blocker appears to be cultural. It has been postulated that the disregard for the sex of the cells arises because most scientist will not typically know the sex of their cells and will perceive that the sex of the cells is irrelevant (Shah et al., 2013). This perspective can be explained if we think that sex differences predominantly derive from hormonal differences. However, research has shown that sex differences exist between cells before the onset of hormonal exposure (Shah et al., 2013). Cells do have a sex which can be determined through PCR studies. However, there is a limitation with immortalized cell lines, as determining the sex can be challenging due to the instability of chromosomes and contamination (Mazure, 2016). With immortalized cell lines generated from one individual, the differences seen between sexes might not arise from a sex effect but an individual difference (Ritz et al., 2014) but the limitation of these studies and the potential for a sex differences need acknowledging. Shah et al. highlight the observation that it is well known that the disease progression of cystic fibrosis is more acute in females. For example, females under the age of 20 have a 60% greater chance of dying compared to males and yet the experimental systems utilizing primary human airway cells to screen compounds make no mention of the sex of the patient from whom the airway cells were obtained (Shah et al., 2013). Increased use of studies with primary cells, where sex is easily identifiable, means we can consider the sex of the sample.

    Embracing sex as a biological variable is an institute challenge

    Failure to account for sex as a biological variable is embedded in our research culture and is apparent throughout the research pipeline. To successfully drive change, we need to think big and long term within each of our institutes. The vision is clear ‘To effectively translate our research we will automatically implement sex sensitive approaches in study design, analysis and data presentation’. The lack of progress to date and awareness of the challenge of change highlight the need to invest appropriately, winning first the minds of leaders, to build a coalition that can drive the change forward. Success will only be achieved if a multi-strategy, holistic approach is used and the process is cyclical, rather than one-off wonders.

    There are many strategies that could be considered. Our institutes differ in size, international location, focus (academic or commercial) and starting cultural position on this topic. For all, a good starting point would be a force field analysis looking at the resistance and driving forces in operation. From Kotter's eight steps, we can see general strategies to follow but the fine detail is going to be specific to your institute and its culture. Some strategies that could be useful have been suggested in Table 2.

    Table 2. Strategies an institute could consider addressing the force imbalance maintaining the status quo
    Area Strategy
    Cultural Highly visible statements of intent published in company literature
    Where necessary, changes to governance structures may be required to facilitate the necessary change
    Engage the community in development of strategies, e.g. development of a design proforma
    Raise awareness through presentations that include area specific examples
    Legitimize: invite speakers
    Award examples of good practice
    Pledge a commitment to follow expectations
    Encourage uptake of the free online training program (Canadian Institutes of Health Research & Health, 2018)
    Consulting: feasibility studies
    Publishing successful or ground-breaking case studies
    Experimental design Experimental design assistant
    Develop design proforma
    Optional training: experimental design
    Mandatory training: experimental design
    Ethical review bodies develop default position on studying both sexes
    Ethical review bodies ensure studies with one sex only must be justified and approved
    Statistical analysis Training: statistics and data visualization
    Consider replication and quality of statistics whilst reviewing papers in journal clubs
    Develop area specific pipelines/templates with examples
    Drop in statistics clinic/facilitate access to biostatistician
    Reporting Training on good experimental reporting
    Develop area specific examples and templates
    Develop ARRIVE assurance mechanism
    • This Table includes some ideas of strategies that can weaken the resistant forces against and strengthen the driving forces for, change. This list is not exhaustive or comprehensive as our institutes vary considerably in style and culture but it provides some ideas as a starting point.

    Conclusions

    With the increasing emphasis on precision medicines and a need to reduce the attrition rate of drugs in the drug development pipeline, it is critical for the community to embrace sex as a biological variable in the preclinical research pipeline. Consequently, we need to create a scientific environment in which our research pipelines automatically implement sex-sensitive approaches. This is essential to improve the value of our research to the health care of women but also to address the wastage due to the poor translation and replication issues prevalent in our research.

    Despite mounting evidence, the sex bias has been maintained and little progress is apparent in addressing the issue. We present here an exploration of how the application of organizational change theory driven by effective leadership can support this change. Significant commitment across all levels of leadership in an institute is needed to form a coherent vision and strategy for change. This should then be implemented and role-modelled, in order to build belief that it is the right direction. This is not a problem for individual scientists to solve in isolation. Institutes should facilitate the building of coalitions, so that scientists can become agents for change, with the sponsorship and endorsement of leaders. The sex bias is embedded in our culture, but the theories discussed in this paper show that there is a way to move forward with a collective approach. This is not going to be easy but the vision is clear and we should meet the challenge and start the journey.

    Conflict of interest

    N.A.K. and N.R. are employees of AstraZeneca.