Systematic exploration of predicted destabilizing nonsynonymous single nucleotide polymorphisms (nsSNPs) of human aldehyde oxidase: A Bio‐informatics study

Abstract Aldehyde Oxidase (hAOX1) is a cytosolic enzyme involved in the metabolism of drugs and xenobiotic compounds. The enzyme belongs to the xanthine oxidase (XO) family of Mo containing enzyme and is a homo‐dimer of two 150 kDa monomers. Nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) of hAOX1 have been reported as affecting the ability of the enzyme to metabolize different substrates. Some of these nsSNPs have been biochemically and structurally characterized but the lack of a systematic and comprehensive study regarding all described and validated nsSNPs is urgent, due to the increasing importance of the enzyme in drug development, personalized medicine and therapy, as well as in pharmacogenetic studies. The objective of the present work was to collect all described nsSNPs of hAOX1 and utilize a series of bioinformatics tools to predict their effect on protein structure stability with putative implications on phenotypic functional consequences. Of 526 nsSNPs reported in NCBI‐dbSNP, 119 are identified as deleterious whereas 92 are identified as nondeleterious variants. The stability analysis was performed for 119 deleterious variants and the results suggest that 104 nsSNPs may be responsible for destabilizing the protein structure, whereas five variants may increase the protein stability. Four nsSNPs do not have any impact on protein structure (neutral nsSNPs) of hAOX1. The prediction results of the remaining six nsSNPs are nonconclusive. The in silico results were compared with available experimental data. This methodology can also be used to identify and prioritize the stabilizing and destabilizing variants in other enzymes involved in drug metabolism.


| INTRODUC TI ON
SNPs are modifications occurring in a specific region of the genome, on a single nucleotide. There has been great interest in SNPs discovery since these are responsible for the majority of the genetic variations among the human population, and are expected to facilitate large-scale genetic association studies. 1  Although the majority of the nsSNPs are phenotypically neutral, some of them can alter the structure and function of a protein, leading in some cases to disease associated conditions. [2][3][4] Aldehyde oxidase (AOX) belongs to the XO family of mononu- connecting domain I to II, and domain II to III, respectively ( Figure S1).
The true physiological function of hAOX1 is unclear but it is known to be responsible for the failure of several phase I clinical trials, due to its diverse catalytic activity that include oxidations, hydrolysis of amides, and reductions. [5][6][7][8][9] The interest in hAOX1, as a drug-metabolizing enzyme, has increased in the past decade since its activity affects the metabolism of different drugs and xenobiotics, some of which designed to resist other metabolizing enzymes (eg cytochrome P450 monooxygenase isoenzymes). In the 1990s, studies using human liver extracts showed variations in the oxidation of known hAOX1 substrates, such as N1-methylnicotinamide and benzaldehyde, 10 and since then, differences in hAOX1 activity have been attributed to factors such as gender, age, cigarette smoking, drug usage, and disease states. 7 Although more than 700 SNPs are reported in dbSNP, 11 only few of them were studied in detail. Smith and coauthors 12 suggested that N1135S polymorphism affects the metabolism of azathioprine, which might lead to nonresponse in the treatment of inflammatory bowel F I G U R E 1 Overall representation of the hAOX1 homodimer and location of the most relevant nsSNPs in the hAOX1 structural domains. Domains I, II, and III are colored in blue, green, and orange, respectively   Interface   linker1   linker2   G177E   Q776V, R802C,  R802H, A806V,  R921H, G924A,  S1060R, S1089P,  R1231H, K1237N,  G1269R, S1271L   C44W, G48V,  G50D, C52G,  T53I, N72K,  R150H, R150G   R32K, G595E, P762L, A1026T,  A1028P, A1028V, Y1033C,  A1102D, R1109C, R1109H   G346R, H358P,  F396C, R429Q,  R433Q, A439E disease. Moreover in an independent study of the functional characterization of variants allowed the classification of the individuals mainly as fast-metabolizers, poor-metabolizers and individuals with no effect on the catalytic efficiency of hAOX1. 13 Recently, Foti et al showed, using biochemical assays, that hAOX1 activity is not altered for some nsSNPs (S1271L, H363Q, A437V, L438V), while in other cases, the protein i) is unstable and cannot be produced (C44W); ii) is inactive (G1269R); iii) shows increased activity (G46E) and iv) shows decreased activity (G50D, R433P, G346R, A439E, R1231H,   K1237N). 14,15 The only disease associated condition related with hAOX1, known so far, is the disorder of the purine metabolism caused by XO and AOX combined deficiency and named type II xanthinuria.
Mutation of human Moco sulfurase gene is responsible for classical type II xanthinuria, due to the failure of the mechanism responsible for inserting the essential sulfur atom into the active center of hAOX1 and XO. 16 The presence of nsSNPs in hAOX1 leading to loss of the Moco insertion may also be related with the presence of type II xanthinuria disease conditions and should be further investigated.
To the best of our knowledge, the only reported pharmacoge-

| Nonsynonymous nsSNPs analysis
We submitted the protein sequence of hAOX1 to eight bioinformatics tools to predict the functional consequences or putative phenotype effects of the nsSNPs: I-Mutant 2.0, 22 PolyPhen 2.0, 23 nsSNPAnalyzer, 24 PhD-SNP, 25 Panther, 26 SNPs&GO, 27 PROVEAN, 28 and SIFT. 29 We classified the nsSNPs as deleterious or nondeleterious by comparing the results obtained from all programs and when concordance was obtained for at least six of eight programs used. The prediction accuracy was improved by performing the concordance analysis of nsSNPs using the tools mentioned above.

| Protein stability analysis
To evaluate the nsSNP-induced changes on protein stability, we submitted the sequence and structure of hAOX1 to the following web servers: I-Mutant 3.0 (sequence or structure-based), 30 INPS (sequence or structure-based), 31 DUET (structure based), 32 SDM (structure based), 33 mCSM (structure based), 34 and MuPRO (sequence and structure-based). 35 The nsSNPs were predicted as destabilizing, stabilizing or neutral, in the last case if no effect on the protein structure was predicted, by comparing the values of the free energy change (ΔΔG) obtained by at least four of eight programs.
The datasets used for all predictor programs were obtained from ProTherm, which is a comprehensive collection of thermo-

| Localization of the nsSNPs in the crystal structure
All the nsSNPs predicted to be deleterious in at least six of eight different in silico tools used and found to be simultaneously validated in the NCBI-dbSNP database, were mapped in the crystal structure of hAOX1 using Coot 38 and PyMol. 39 Also, the LigPlot program 40 was used to identify all the residues interacting with the protein cofactors.

| SNPs identification and stability analysis
As to date, in the NCBI-dbSNP database, a total of 769 SNPs was found in hAOX1, from which 526 belong to the nonsynonymous functional category and were further selected for the analysis (Figure 2A).
Detailed experimental investigation for understanding the functional effects of all nsSNPs is a time-consuming and cumbersome process.
Bioinformatics tools were therefore used to identify and prioritize the  (Figure 2A and B). The prediction results of remaining 315 nsSNPs are nonconclusive and hence they were excluded for further stability analysis. All the 119 deleterious variants are described in Table S1, including the Minor Allele Frequency (MAF) details and predicted ΔΔG values from all the programs.
To predict the protein stability-changes induced by the presence of polymorphism in the 119 putative deleterious nsSNPs, we used a series of sequence and structure-based stability prediction programs (six programs, eight outputs in total), as detailed in the Materials and Methods and Supplementary section. Stability analysis results showed that, out of the 119 deleterious variants, 104 might be responsible for destabilizing the protein structure. In contrast, five nsSNPs are predicted to have a stabilizing effect on the protein structure, namely T53I, H100L, Q776V, T1053I, and S1271L.

| Structural localization of nsSNPs in hAOX1
In this study, of 119 putative deleterious nsSNPs, we analyzed only the 37 nsSNPs according to their location in the different domains F I G U R E 2 Screening of hAOX1 nsSNPs available at the NCBI-dbSNP data base using concordance analysis: (A) overall statistics; (B) prediction of putative phenotypic effects of 526 nsSNPs for hAOX1 using 8 programs. The nsSNPs were classified as deleterious (D) or nondeleterious (ND) if concordance in at least 6/8 programs was obtained. * represents nonconclusive ( Figure 1). The remaining nsSNPs, located at the protein surface, are listed in Table S1. Figure 1 Figure 3C and Table   S1). This is in agreement with previous experimental results where hAOX1-C44W variant could not be produced by heterologous expression. 14 The inability to synthesize a stable C44W protein vari-  first and crucial steps in protein maturation and its lack may lead to a structurally disordered enzyme, in which the cofactor cannot be inserted. 42 In domain I, we can also find nsSNPs vicinal to the FeS coordinating cysteines (G48V, G50D, T53I, N72K, and R150H/G) ( Figure 3A and B). The G50D variant is predicted to be deleterious, showing a ΔΔG value of −0.71 kcal/mol, destabilizing the protein structure (6/8 programs) (Table S1 and Figure 3C). This is in agreement with a recent study where this mutant was biochemically characterized showing similar Km values as the wild type, but 20% reduced activity towards oxidation reactions. 15 The substitution of G50 by a larger, charged residue (Asp) would result in destabilization of the neighboring residues, with direct impact on Cys49 residue, which is coordinating [2Fe-2S] II. The same study includes G46E variant, which also shows a similar Km but a 2-fold increase in k cat , suggesting that this variant does not have a negative effect on the protein. 15 Foti and coauthors suggested in 2017 that G46E gives rise to a variant with increased oxidation activity. 15 Although the nature of the side chains of Gly and Glu residues are very different, and a large impact in the structure could have been anticipated, our in silico results predict that this nsSNP is nondeleterious, corroborating the experimental data. 15 The loop that harbors G46 is in the electron Its mutation to a smaller residue (His or Gly) will prevent this type of interaction, probably destabilizing the cofactor-binding region.
The bioinformatics analysis here performed predicts that this mutation will have a deleterious effect, more pronounced for Gly than

| nsSNPs at the linker I
Only the G177E nsSNPs was found to be in the linker1 region of the protein. In the hAOX1 crystal structure, this is a mobile region that could not be defined in the electron density map (residues  and therefore the structural effect of this nsSNP cannot be anticipated by a simple structural analysis. 43 However, according to this in silico tools, this nsSNP is deleterious and affects the stability of protein structure with the predicted ΔΔG value of −0.93 kcal/mol (Table S1 in Supplement).

| nsSNPs at the hAOX1 domain II
The FAD cofactor in domain II of hAOX1 is responsible for transferring the electrons generated during the catalytic reaction to the terminal electron acceptor, which is molecular oxygen (Figure 1). A total of 21 nsSNPs were found to be located at the FAD domain, which corresponds to 18% of total number of predicted deleterious nsS-NPs (Table S1). Of these, 17 variants are predicted to have a destabilizing effect on the protein structure, whereas only one (H340Y) corresponds to a neutral variant (Table S1) Two nsSNPs, R433Q and A439E are part of the FAD variable loop 1 (Q 430 AQRQENALAI 440 ), which is described in XO as responsible for the XO-XDH inter-conversion. 44 These residues are conserved in mAOX3 and bXO/XDH, and the crystal structure of hAOX1 shows the side chain of R433 residue is H bonded to main chain atoms of the loop. Our in silico analysis suggests that these two variants (R433Q and A439E) will highly destabilize the protein (with ΔΔG values of −1.05 and −0.82 kcal/mol, respectively) and experimental data showed that R433P and A439E variants have decreased oxidation activity, with 4 and 5 times lower k cat compared to the wt enzyme, respectively. 15 The recent kinetic 15 and structural (not published) data concerning the A437V and L438V nsSNPs from the variable loop 1 showed interesting results. Our computational studies predicted that these correspond to nondeleterious nsSNPs, which is in accordance with the steady-state kinetic results that have shown that the Km values of the variants are comparable to that of the wt. 15 In contrast, the value of k cat resulted in a similar value for A437V, and a 1.2 increase for L438V. 15 During the hAOX1 catalytic oxidation, reduction of oxygen species occurs, and H 2 O 2 and O 2− are released. 45,46 The authors observed that a significantly increased rate of superoxide production was obtained for the L438V variant (72%). 15 In the FAD domain we can also find buried nsSNPs, located close to the FAD and to the [2Fe-2S] I cluster, that destabilize the protein structure. This is the case of G346R variant, where the introduction of a bulky and charged side chain is predicted to have a deleterious effect (ΔΔG −0.05 kcal/mol). This prediction is in agreement with experimental data on G346R, whose catalytic activity is decreased (also 4 times lower k cat when compared to wt enzyme). 15

| nsSNPs at the hAOX1 domain III
The majority of hAOX1 nsSNPs (65%) are located in the C-terminal region of the Moco domain III, with 71 destabilizing variants and three stabilizing variants (Q776V, T1053I and S1271L). The prediction results of the three variants T594M, T706I, and A1167V in this domain are nonconclusive, while variants R906W and S1194N are predicted as neutral. The detailed results of this prediction are given in Table S1 and Figure 5C.
The nsSNPs near the Moco active site and surrounding region are the most relevant ones for enzyme activity and substrate specificity. Some of these nsSNPs have been expressed and characterized, namely R802C, R921H, G1269R, and S1271L (Table S1 and Figure 5C). 13,14 Interestingly, in the first report concerning the presence of nsSNPs in hAOX1, it was shown that the R802C nsSNP was predominantly purified in its monomeric form in solution, resulting in a higher proportion of the inactive form of the enzyme. 13 Nevertheless, the catalytic efficiency of the dimeric portion of the protein was not affected and similar values were obtained in comparison to the wt hAOX1. In the hAOX1 crystal structure, residue R802 from chain B establishes a salt bridge with E765 residue from chain A, and this could explain why the R802C variant was predominantly obtained in its monomeric form. In fact, R802 residue gives rise to two variants, R802C and R802H, both predicted to destabilize the protein structure (ΔΔG of −0.58 and −0.79 kcal/mol, respectively), probably because the mutated side chains cannot promote inter-molecular electrostatic interactions.
In the case of G1269R variant, experimental results have shown that, although metal analysis suggests that all cofactors are present, the enzyme is inactive. 14 Gly1269 residue is located immediately before the catalytically essential Glu1270 and its modification to a large and positively charged residue as Arg, will likely promote large structural changes to allow accommodating the bulky side chain, affecting the interactions between the cofactor and the surrounding residues, hence impacting protein stability and lack of activity (ΔΔG: −0.76 kcal/mol).
The S1271L nsSNP, also adjacent to the catalytically essential Glu1270 residue, was so far identified to be heterozygous in all the tested individuals 13 and, according to six of eight programs used, is predicted to be stabilizing the protein structure (Table S1 and Figure 5C). Biochemical analysis and kinetic data however, showed that the resulting protein variant has similar characteristics to the wt enzyme. Also, the crystal structure of this variant, the first published structure of a hAOX1 nsSNP (PDB ID: 5EPG), showed no major structural deviations. 14 S1089P is another destabilizing variant, with a predicted ΔΔG value of −0.75 kcal/mol. The hAOX1 crystal structure shows that Ser side chain is hydrogen bonded to Glu1270 residue and is part of the loop that surrounds the pyranopterin dithiolene moiety (S1086-S1089). A Pro residue at this position is likely to disrupt the structure of the loop, and impair the intra-molecular interactions required for cofactor stabilization.
Also, close to the Mo active site, we find R921H, G924A and A806V variants. R921 side chain is almost parallel to the pterin moiety of Moco (at ca 3-4 Å) and is in disallowed regions of the Ramachandran plots of all XO family members' crystal structures.
The ΔΔG value obtained for this residue (−0.76 kcal/mol), suggests that its substitution by a His residue will possibly prevent cofactor accommodation producing a less active enzyme. In the same loop, we find G924A, also a destabilizing variant (ΔΔG: −0.62 kcal/mol).
A806 residue is positioned sideways to the pterin rings, H bonded to O4 atom via its amino group. In silico analysis suggests that its replacement by a larger residue will impact, although to less extent, the stability of the structure (ΔΔG = −0.44 kcal/mol).
Another interesting nsSNP at the active site is Q776V, but so far, no kinetic data are available. Q776 residue is a highly conserved residue among XO and AOX family members, 9,47,48 that makes an H bond with the Mo apical oxygen ligand (2.9 Å) ( Figure 5A and B).  Domain III includes also the noncompetitive inhibition site, where thioridazine, and likely other phenothiazine family members, binds. One of the loops at the surface of the protein that accommodates the inhibitor contains S1060 residue and its variant S1060R is predicted as deleterious and structure destabilizing variant. Showing a high ΔΔG value (−0.94 kcal/mol), this variant's side chain will very likely unstructure the inhibitor pocket and decrease the binding affinity, influencing the inhibitory effect of this family of molecules.
This might be related with the mechanisms of mutation induction, and no further evidence could be found to explain this occurrence.

| nsSNPs at the dimerization interface
Several nsSNPs are found at the dimerization interface or very close to it: R32K, G595E, P762L, A1026T, A1028P/V, Y1033C, A1102D, and R1109C/H ( Figure 1 and Table S1). All these variants are predicted to be deleterious and destabilizing the structure of the protein, showing high ΔΔG values. It is worth mentioning R32K: in the crystal structure of hAOX1 its side chain while pointing towards the protein interior, is involved in a salt bridge with D601 and D602 residues of the same monomer and 8/8 programs suggest that its mutation to a Lys residue will have a deleterious effect. Although this variant corresponds to a mutation with an equally charged residue, its side chain is slightly shorter and with a higher degree of flexibility, which might be enough to diminish electrostatic interactions, unstructuring this region, and destabilizing the protein ( Figure 3C and

| D ISCUSS I ON AND CON CLUS I ON S
All residues located at the active site, near the protein cofactors, noncompetitive inhibition site, or in the electron transfer pathway, have a higher probability of affecting the enzyme activity or disrupting the protein secondary and/or tertiary structure, when replaced by a different residue. On the contrary, modification of residues located at the protein surface should cause no major impact on the overall structure, catalytic activity or substrate specificity of the enzyme. Several amino acid residues from hAOX1 have been identified as important for its putative biological function. In particular those located at the protein Mo active site, near the [2Fe-2S] centers and FAD cofactor, as well as in the electron transfer path, also important for the enzyme's catalytic activity. 43 Our systematic bioinformatics analysis reported 119 deleterious and 92 nondeleterious nsSNPs in hAOX1. Due to the very few studies on this enzyme, the MAF of 119 nsSNPs is less than 1% in the population. However, the obtained in silico data with reference to putative phenotypic and protein structure stability effects may suggest and led to further studies on site directed mutagenesis, biophysical, X-ray crystallographic and pharmacogenetics characterization. To our knowledge, this study constitutes the first extensive analysis for the presence of nsSNPs in hAOX1. We believe that this preliminary investigation provides a systematic route for identification and prioritization of potentially important nsSNPs in hAOX1. It may thus be used, in combination with experimental screening, to select most promising destabilizing variants associated with hereditary disorder related to AOX or XO family members. This investigation will also facilitate future studies on pharmacogenomics and personalized medicine.

ACK N OWLED G EM ENTS
Work supported by Fundação para a Ciência e a Tecnologia/MEC (grant reference PTDC/BBB-BEP/1185/2014). This work was supported by the Applied Molecular Biosciences Unit-UCIBIO